Jokers's Blog

高尚是高尚者的墓志铭 卑鄙是卑鄙者的通行证


  • 首页

  • 分类

  • 归档

  • 标签

Nginx作反向代理保护Kibana

发表于 2018-08-15 | 分类于 BigData |

安装 Nginx ,先安装 EPEL repository

$ yum install epel-release
$ yum install nginx
$ systemctl start nginx
$ systemctl enable nginx

httpd-tools, setting up an HTTP authentication

$ yum install httpd-tools
  • create a “.htpasswd” file to store our credential data such as Usernames and Passwords in an encrypted format

    $ htpasswd -c [filename] username
    

Configuring Nginx

$ /etc/nginx/nginx.conf
  • Find the “server” directive and change it like below:

    server {
        listen 80;
        server_name \_;
        location ~ ^/kibana/(.\*)$ {
            rewrite /kibana/(.\*) /$1 break;
            proxy_pass http://127.0.0.1:9396;
            auth_basic "Restricted";
            auth_basic_user_file /etc/nginx/.htpasswd;
        }
    }
    
  • 检查 Nginx 配置

    $ nginx -t
    
  • 重启 Nginx

    $ systemctl restart nginx
    

Configuring Kibnana

  • 打开配置文件

    $ vim /etc/kibana/kibana.yml
    
  • 添加

    server.basePath: "/kibana"
    
  • 重启 Kibana

    $ systemctl restart Kibana
    

Elasticsearch常用操作

发表于 2018-08-14 | 分类于 BigData |

Reindex

  1. 新建一个 index ,复制原 index 的 mappings 和 settings
  2. 设置新 index 的 refresh_interval 为-1, number_of_replicas 为 0, reindex 效率更高

    PUT /twitter/_settings
    {
        "index" : {
            "number_of_replicas" : 0
        }
    }
    
    PUT /twitter/_settings
    {
        "index" : {
            "refresh_interval" : "-1"
        }
    }
    
  3. reindex api

    • Just leaving out version_type (as above) or setting it to internal will cause Elasticsearch to blindly dump documents into the target, overwriting any that happen to have the same type and id:

      POST _reindex
      {
          "source": {
              "index": "twitter"
          },
          "dest": {
              "index": "new_twitter",
              "version_type": "internal"
          }
      }
      
    • Setting version_type to external will cause Elasticsearch to preserve the version from the source, create any documents that are missing, and update any documents that have an older version in the destination index than they do in the source index

      POST _reindex
      {
          "source": {
              "index": "twitter"
          },
          "dest": {
              "index": "new_twitter",
              "version_type": "external"
          }
      }
      
    • Settings op_type to create will cause _reindex to only create missing documents in the target index. All existing documents will cause a version conflic

      POST _reindex
      {
          "source": {
              "index": "twitter"
          },
          "dest": {
              "index": "new_twitter",
              "op_type": "create"
          }
      }
      
    • By default version conflicts abort the _reindex process but you can just count them by settings “conflicts”: “proceed” in the request body

      POST _reindex
      {
          "conflicts": "proceed",
          "source": {
              "index": "twitter"
          },
          "dest": {
              "index": "new_twitter",
              "op_type": "create"
          }
      }
      
    • 如果某个字段设置了 completion suggest ,重建索引时或则索引数据时因相应字段为空会报错

      # value must have a length > 0
      # elasticsearch completion field does not support null values
      
      POST _reindex
      {
          "source": {
              "index": "twitter"
          },
          "dest": {
              "index": "new_twitter"
          }
      }
      "script": {
          "source": "if (ctx._source.name == null || ctx._source.name == '') {ctx._source.tmname=' '}",
          "lang": "painless"
      }
      
    • kibana 30s 后返回 timeout ,在请求后跟参数 ?wait_for_completion=true

  4. 重新设置 refresh_interval 和 number_of_replicas

  5. 等待 index 状态 变为 green
  6. 添加 aliases

Reindex task

  • 获取正在运行的 reindex

    GET _tasks?detailed=true&actions=*reindex
    
  • 查看单个 task

    GET /_tasks/taskId:1
    
  • 删除某个 task

    POST _tasks/task_id:1/_cancel
    

null_value

null 值不能被索引和查询, null_value 可以明确的替换 null 值

PUT my_index
{
    "mappings": {
        "_doc": {
          "properties": {
            "status_code": {
              "type":       "keyword",
              "null_value": "NULL"
            }
          }
        }
    }
}
  • The null_value needs to be the same datatype as the field. For instance, a long field cannot have a string null_value.
  • The null_value only influences how data is indexed, it doesn’t modify the _source document.

Completion Suggester

  • mapping

    PUT music
    {
        "mappings": {
            "_doc" : {
                "properties" : {
                    "suggest" : {
                        "type" : "completion",
                        "analyzer" : "simple",
                        "search_analyzer" : "simple"
                    },
                    "title" : {
                        "type": "keyword"
                    }
                }
            }
        }   
    }
    
  • query ,_source 会影响查询效率

    POST music/_search?pretty
    {
        "_source": "suggest",
        "suggest": {
            "song-suggest" : {
                "prefix" : "nir",
                "completion" : {
                    "field" : "suggest"
                },
                "size":10
            }
        }
    }
    
  • Whether duplicate suggestions should be filtered out (defaults to false).

    The autocomplete suggester is document-oriented in Elasticsearch 5 and will thus return suggestions for all documents that match. In Elasticsearch 1 and 2, the autocomplete suggester automatically de-duplicated suggestions.

    POST music/_search?pretty
    {
        "_source": "suggest",
        "suggest": {
            "song-suggest" : {
                "prefix" : "nir",
                "completion" : {
                    "field" : "suggest"
                },
                "size":10,
                "skip_duplicates":true
            }
        }
    }
    

DSL

  • completion suggest sample

    s = s.suggest('my_sugestion', keyword, completion={ 'field' : 'name', 'size':100})
    

中国裁判文书网反爬机制更新

发表于 2018-08-06 | 分类于 Spider |

关于更新

2019年1月12日,更新一次反爬机制。这里做记录,仅作为以后参考。

  1. 又改了,没心情看改的什么。
  2. 反编译了手机APP,直接请求接口。
  3. 需要获得Token,Token会过期。
  4. 也封IP,没有web端变态。
  5. 变态的是每种查询条件只给20条。
  6. 所以,拼条件吧。

关于更新

2018年12月16日,更新一次反爬机制。这里做记录,仅作为以后参考。

  1. 终于放大招了,开始封IP了,并且非常容易触发被封,我已经每30s访问一次了,依然绕不过。
  2. 验证码改了,不规则的四位英文及数字,不像以前即使输入正确也要提交十几次才能通过。现在是如果被封IP,还是会弹验证码,但是验证成功也没用,还是会弹验证码,这个验证码有啥用呢。
  3. number不清楚,好像又得传了。
  4. 买代理吧,

关于更新

2018年11月29日,更新一次反爬机制。这里做记录,仅作为以后参考。

  1. 在获取列表时,number不再是必传参数。

关于更新

2018年8月18日,更新一次反爬机制。这里做记录,仅作为以后参考。

  1. 在列表页获取详情页链接过程中,DocID作了加密,只要做一次解密DocID即可。
  2. AES双向对称加密,重点在与获取分发的密钥。

关于更新

2018年8月15日,更新一次反爬机制。这里做记录,仅作为以后参考。

  1. vl5x的计算,计算vl5x的函数有修改。

关于更新

2018年8月6日,更新一次反爬机制。这里做记录,仅作为以后参考。

  1. vl5x的计算,计算vl5x的函数相关函数又添加了一些。排查时最简单的方法就是用浏览器请求,拿到vjkl5,用现有解码的js函数执行得到结果和浏览器form里的vl5x进行比较。

  2. 在100页以内请求列表页过于频繁会跳验证码,想要成功验证,需要多次请求,一般十次以内。

Scrapy+Scrapyd+Redis

发表于 2018-07-27 | 分类于 Spider |

Scrapy

安装

pip install scrapy
  • 某些系统会找不到 Twisted ,需要手动下载source,解压后再安装

    bzip2 -d Twisted-18.7.0.tar.bz2
    pip3 install Twisted-18.7.0.tar
    

使用 scrapy 命令生成一个新的 project

scrapy startproject myproject

myproject/
    scrapy.cfg            # deploy configuration file

    myproject/             # project's Python module, you'll import your code from here
        __init__.py

        items.py          # project items definition file

        middlewares.py    # project middlewares file

        pipelines.py      # project pipelines file

        settings.py       # project settings file

        spiders/          # a directory where you'll later put your spiders
            __init__.py

在 tutorial/spiders 目录下创建第一个 spider

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

运行 spider ,在 project 的根目录运行

scrapy crawl quotes

Scrapyd ,远程管理 Scrapy

安装

pip install scrapyd

远程访问,修改配置文件 site-packages/scrapyd/default_scrapyd.conf

bind_address = 0.0.0.0

运行

scrapyd

Scrapy project 部署到 Scrapyd

  • eggifying the project
  • uploading the egg to Scrapyd via the addversion.json
  • 以上手动步骤可以通过 scrapyd-client 提供的 scrapyd-deploy 工具实现

安装 scrapyd-client

pip install scrapyd-client

部署

scrapyd-deploy <target> -p <project>
  • 可以编辑 scrapy.cfg 设置默认值,免于指定 target 和 project

    [deploy:target]
    url = http://localhost:6800/
    username = scrapy
    password = secret
    project = myproject
    
  • 执行

    scrapyd-deploy target
    

schedule a spider run

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2
{"status": "ok", "jobid": "26d1b1a6d6f111e0be5c001e648c57f8"}

scrapy-redis 分布式

安装Redis

yum install redis
pip install scrapy-redis

配置Redis ,编辑 /etc/redis.conf

  • 设置访问密码

    requirepass password
    
  • 注释 bind ,可远程访问

    # bind 127.0.0.1 ::1
    

编辑 project 的 settings.py

# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"

# Ensure all spiders share same duplicates filter through redis.
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"

# Store scraped item in redis for post-processing.
ITEM_PIPELINES = {
    'scrapy_redis.pipelines.RedisPipeline': 300
}

# Specify the full Redis URL for connecting (optional).
# If set, this takes precedence over the REDIS_HOST and REDIS_PORT settings.
REDIS_URL = 'redis://user:password@host:6379'

运行 spider ,在 project 的根目录运行

scrapy crawl quotes

处理 Item

所有 Item 保存在 Redis ,key 为 quotes:items ,可以写脚本保存到数据库。

Spark

发表于 2018-06-12 | 分类于 BigData |

安装

  • 安装JAVA

    $ java -version
    
  • 安装Scala

    $ wget https://downloads.lightbend.com/scala/2.12.6/scala-2.12.6.rpm
    $ sudo rpm -ivh scala-2.12.6.rpm
    $ scala -version
    
  • 安装Spark

    $ wget http://mirrors.hust.edu.cn/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
    $ tar xvf spark-2.3.1-bin-hadoop2.7.tgz
    $ sudo mv spark-2.3.1-bin-hadoop2.7/* /usr/local/spark
    
    $ vim ~/.bashrc
    export PATH = $PATH:/usr/local/spark/bin
    $ source ~/.bashrc
    
    $ spark-shell
    

Scala 基本语法

发表于 2018-06-10 | 分类于 Scala |

函数和方法

scala> class C {
     |   var acc = 0
     |   def minc = { acc += 1 }
     |   val finc = { () => acc += 1 }
     | }
defined class C

scala> val c = new C
c: C = C@1af1bd6

scala> c.minc // calls c.minc()

scala> c.finc // returns the function as a value:
res2: () => Unit = <function0>

abstract class

  • 不能实例化抽象类

    scala> abstract class Shape{
    |   def getArea():Int
    | }
    defined class Shape
    

Traits

trait Car {
  val brand: String
}

trait Shiny {
  val shineRefraction: Int
}
class BMW extends Car {
  val brand = "BMW"
}

class BMW extends Car with Shiny {
  val brand = "BMW"
  val shineRefraction = 12
}
  • 优先使用特质。一个类扩展多个特质是很方便的,但却只能扩展一个抽象类。

  • 如果你需要构造函数参数,使用抽象类。因为抽象类可以定义带参数的构造函数,而特质不行。例如,你不能说trait t(i: Int) {},参数i是非法的。

Object

Scala classes cannot have static variables or methods. Instead a Scala class can have what is called a singleton object, or sometime a companion object.

scala> class Bar(foo: String)
defined class Bar

scala> object Bar{
     |   def apply(foo: String) = new Bar(foo)
     | }

apply 方法,语法糖

scala> class Foo{}
defined class Foo

scala> object FooMaker{}
defined object FooMaker

scala> object FooMaker{
     |   def apply() = new Foo
     | }
defined object FooMaker

scala> val newFoo = FooMaker()
newFoo: Foo = Foo@229cb4d8

函数即对象

函数是Function*特质的示例,并且定义了apply语法糖

scala> object addOne extends Function1[Int, Int]{
     |   def apply(m: Int): Int = m + 1
     | }
defined object addOne

scala> addOne(1)
res0: Int = 2

case class

It makes only sense to define case classes if pattern matching is used to decompose data structures.

  • 实例化可以不加关键词new

  • toString更漂亮

  • 默认实现equals和hashCode

  • 默认序列化

模式匹配

  • 基本用法

    1
    2
    3
    4
    5
    6
    7
    8
    9
    scala> val times = 1
    times: Int = 1

    scala> times match{
    | case 1 => "one"
    | case 2 => "two"
    | case _ => "other"
    | }
    res2: String = one
  • 带if守卫

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    scala> val score = 70
    score: Int = 70

    scala> score match{
    | case s if (score >= 90) => "优秀"
    | case s if (score >= 70) => "良好"
    | case s if (score >= 60) => "及格"
    | case _ => "不及格"
    | }
    res5: String = 良好
  • 类型的模式匹配

    1
    2
    3
    4
    5
    6
    7
    8
    try{
    something
    }catch{
    case e: IllegalArgumentException=> do something
    case e: FileNotFoundException=> do something
    case e: IOException => do something
    case _: Exception => do something
    }
  • case class的模式匹配

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    class Customer
    case class PersonalUser(name:String) extends Customer
    case class EnterpriseUser(name:String) extends Customer
    case class VipUser(name:String) extends Customer
    ...

    def service(c:Customer):Unit={
    c match{
    case EnterpriseUser(_)=>print("Enterprise")
    case PersonalUser(_)=>print("Personal")
    case VipUser(_)=>print("Vip")
    case _ =>print("None")
    }
    }
  • Option的模式匹配

    Some and None are both children of Option

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    def toInt(in: String): Option[Int] = {
    try {
    Some(Integer.parseInt(in.trim))
    } catch {
    case e: NumberFormatException => None
    }
    }

    toInt(someString) match {
    case Some(i) => println(i)
    case None => println("That didn't work.")
    }

Hive2安装

发表于 2018-05-24 | 分类于 BigData |

前提

适应版本Hadoop已安装,hdfs和yarn可以成功运行

解压缩

$ sudo mv apache-hive-2.3.3-bin.tar.gz /usr/local/
$ cd /usr/local/
$ sudo tar -zxvf apache-hive-2.3.3-bin.tar.gz
$ sudo mv apache-hive-2.3.3-bin hive
$ sudo chown -R hadoop:hadoop hive

配置环境变量

$ vim ~/.bashrc
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.

$ source ~/.bashrc

Hive配置

$ cp conf/hive-env.sh.template conf/hive-env.sh
$ vim conf/hive-env.sh
HADOOP_HOME=/usr/local/hadoop
export HIVE_CONF_DIR=/usr/local/hive/conf

$ cd /usr/local/hive
$ cp conf/hive-default.xml.template conf/hive-site.xml
$ vim conf/hive-size.xml
  • system:java.io.tmpir & system:user.name

    <property>
        <name>system:java.io.tmpdir</name>
        <value>/usr/local/hive/tmp</value>
    </property>
    <property>
        <name>system:user.name</name>
        <value>jokers</value>
    </property>
    
  • mysql连接配置

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://xx.xxx.xx.xx:3306/hive?createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Username to use against metastore database</description>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
        <description>password to use against metastore database</description>
    </property>
    
  • hive的存储仓库位置,观察配置默认值

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>
    
    <property>
        <name>hive.exec.scratchdir</name>
        <value>/tmp/hive</value>
        <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username>is created, with ${hive.scratch.dir.permission}.</description>
    </property>
    
    <property>
        <name>hive.querylog.location</name>
        <value>${system:java.io.tmpdir}/${system:user.name}</value>
        <description>Location of Hive run time structured log file</description>
    </property>
    
  • 使用Hadoop建立Hive仓库

    $ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp/hive
    $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
    $ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp/hive
    $ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
    
  • 下载mysql-connector-java.jar

    $ sudo yum install mysql-connector-java.noarch
    $ cp /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar
    
  • Hive初始化mysql数据库

    $ schematool --dbType mysql --initSchema
    
  • 启动Hive

    $ hive
    

MySQL数据导入hive

  • 安装Sqoop

  • 配置mysqldump(如果使用–direct)

    ln -fs /user/local/mysql/bin/mysqldump /usr/bin
    
  • 给Hadoop添加环境变量

    $ vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hive/conf/
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hive/lib/
    # 不能这样写/usr/local/hive/conf/* /usr/local/hive/lib/*
    
  • 导入示例

    sqoop import --connect jdbc:mysql://xx.xx.x.x:3306/databases --table table --username hive --password hive --direct --fields-terminated-by '|'  --warehouse-dir /user/hive/warehouse  --hive-import
    

远程链接hiveserver2

  • 配置hive-site.xml

    <property>
        <name>hive.server2.thrift.client.user</name>
        <value>username</value>
        <description>Username to use against thrift client</description>
    </property>
    <property>
        <name>hive.server2.thrift.client.password</name>
        <value>password</value>
        <description>Password to use against thrift client</description>
    </property>
    <property>
        <name>hive.server2.transport.mode</name>
        <value>binary</value>
        <description>
            Expects one of [binary, http].
            Transport mode of HiveServer2.
        </description>
    </property>
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
        <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>0.0.0.0</value>
        <description>Bind host on which to run the HiveServer2 Thrift service.</description>
    </property>
    
  • 配置hadoop下的core-site.xml

    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <!--value>master</value-->
        <value>*</value>
    </property>
    
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <!--value>hadoop</value-->
        <value>*</value>
    </property>
    
  • 启动hive服务,客户端可以使用beeline命令行,也可使用sql developer客户端工具,需要下载hive jdbc驱动

    nohup hiveserver2 >hive.log 2>&1 &
    

error解决

hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly

$ vim ~/.bashrc
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hive/lib/*
$ source ~/.bashrc

MySQL5.7数据导入导出

发表于 2018-05-18 | 分类于 DataBase |

导出

SELECT * INTO OUTFILE 'data.txt' FIELDS TERMINATED BY '|' FROM source_table;  

导入

LOAD DATA INFILE 'data.txt' INTO TABLE target_table FIELDS TERMINATED BY '|';

注

  • 5.7以后,mysql添加全局变量secure_file_priv,限制导入导出,查看变量的值

    show variables like '%secure%';
    
  • 如果值为NULL,则不允许导入导出。如果指定路径,则只能在将文件导出到指定路路径并且在指定路径导入。值为空则无限制。错误提示如下:

    ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
    
  • 修改变量方法,在etc/my.cnf添加一行:

    secure-file-priv=\home\mysql_dump
    
  • 最后改变指定路径的权属

    chown mysql:mysql \home\mysql_dump
    

参考

该方法在18分钟内导出1.6亿条记录,46min内导入6472W条记录,平均速度:8442W条/h

hadoop2安装

发表于 2018-05-18 | 分类于 BigData |

下载hadoop

可选择创建新用户

$ sudo useradd -m hadoop    # -m会在/home下建立用户文件夹
$ sudo passwd hadoop

解压缩

$ sudo mv hadoop-2.9.1.tar.gz /usr/local/
$ cd /usr/local/
$ sudo tar -zxvf hadoop-2.9.1.tar.gz
$ sudo mv hadoop-2.9.1 hadoop
$ sudo chown -R hadoop:hadoop hadoop

伪分布式配置

<?xml version="1.0"?>
<!-- core-site.xml -->
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost/</value>
    </property>
</configuration>

<?xml version="1.0"?>
<!-- hdfs-site.xml -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
       <name>dfs.name.dir</name>
       <value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
    </property>
    <property>
       <name>dfs.data.dir</name>
       <value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
    </property>
</configuration>

<?xml version="1.0"?>
<!-- mapred-site.xml -->
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

<?xml version="1.0"?>
<!-- yarn-site.xml -->
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>localhost</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

配置SSH无密码登陆

确保ssh已安装

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Change the permissions of .ssh/authorized_keys to 640

测试

ssh localhost

格式化HDFS

$ hdfs namenode -format

开始和结束daemons

$ vim ~/.bashrc
export PDSH_RCMD_TYPE=ssh
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

$ start-dfs.sh
$ stop-dfs.sh
$ start-yarn.sh
$ stop-yarn.sh
(deprecated)$ mr-jobhistory-daemon.sh start historyserver
$ mapred --daemon start historyserver
  • 验证是否启动成功

NameNode - http://localhost:50070/
yarn - http://localhost:8088/

创建HDFS目录

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

改变ssh端口号

  • 编辑/etc/ssh/sshd_config

    Port 22
    
  • 重启sshd

    $ sudo service sshd restart
    
  • 编辑hadoop-env.sh

    $ export HADOOP_SSH_OPTS="-p <num>"
    

执行start-dfs.sh报错

rcmd: socket: Permission denied
  • 检查pdsh的默认rcmd是ssh还是rsh

    pdsh -q -w localhost
    
  • 如果是rsh需要修改为ssh,添加到~/.bashrc

    export PDSH_RCMD_TYPE=ssh
    source ~/.bashrc
    
  • 如故依然报错

    No such rcmd module "ssh"
    
  • 安装pdsh-rcmd-ssh,使得SSH为可用的rcmd模块

    dnf install pdsh-rcmd-ssh
    

Ranking

发表于 2018-05-12 | 分类于 Others |

剑桥大学University of Cambridge
牛津大学University of Oxford
伦敦大学学院University College London
帝国理工学院Imperial College London
伦敦大学国王学院King’s College London
爱丁堡大学The University of Edinburgh
曼彻斯特大学The University of Manchester
伦敦政治经济学院the London Shool of Economics and Political Scienc
布里斯托大学University of Bristol
华威大学The University of Warwick
格拉斯哥大学University of Glasgow
杜伦大学Durham University
谢菲尔德大学The University of Sheffield
诺丁汉大学The University of Nottingham
伯明翰大学University of Birmingham
圣安德鲁斯大学University of St Andrews
利兹大学University of Leeds
南安普顿大学University of Southampton
伦敦大学玛丽女王学院Queen Mary University of London
兰卡斯特大学University of Lancaster
约克大学University of York
卡迪夫大学Cardiff University
阿伯丁大学University of Aberdeen
埃克塞特大学University of Exeter
巴斯大学University of Bath
纽卡斯尔大学Newcastle University
利物浦大学The University of Liverpool
雷丁大学University of Reading
英国女王大学Queen’s University Belfast
萨塞克斯大学The University of Sussex
拉夫堡大学Loughborough University
莱斯特大学University of Leicester
伦敦大学皇家霍洛威学院Royal Holloway University of London
邓迪大学University of Dundee
萨里大学University of Surrey
东英吉利大学University of East Anglia
思克莱德大学University of Strathclyde
伦敦大学亚非学院SOAS University of London
伦敦大学伯克贝克学院Birkbeck, University Of London
赫瑞·瓦特大学Heriot-Watt University
伦敦城市大学City University London
布鲁内尔大学Brunel University
埃塞克斯大学University of Essex
牛津布鲁克斯大学Oxford Brookes University
阿斯顿大学Aston University
肯特大学University of Kent
伦敦大学金史密斯学院Goldsmiths, University of London

1…345
Jokers

Jokers

hlxstc680@gmail.com

49 日志
13 分类
6 标签
© 2019 Jokers
由 Hexo 强力驱动
主题 - NexT.Muse