Contents

Hadoop - 高可性用配置

Hadoop 為 Apache 基金會的開源頂級專案,為軟體框架做為分散式儲存及運算,無論是增減加機器都能處理,另具備高可用性、數據副本等能力

機器基本訊息:

  1. 準備五台機器 (兩台主節點、三台工作節點)
IP FQDN HOSTNAME 用途
192.168.1.30 test30.example.org test30 Master 節點 (Namenode)
192.168.1.31 test31.example.org test31 Master 節點 (ResourceManager)
192.168.1.32 test32.example.org test32 Worker 節點
192.168.1.33 test33.example.org test33 Worker 節點
192.168.1.34 test34.example.org test34 Worker 節點
  1. OS : Ubuntu 18.04

  2. 資源配置 :

    • Cpu : 4 core
    • Ram : 8 G
    • Disk : 50 G

建置步驟 - 高可用性配置:

Note
新增節點功能:
1. 三台 Worker 機器當作 Journalnodes及ZooKeeper
2. 原本的 NameNode 機器多新增一個 ResourceManager Stantby
3. 原本的 ResourceManager 機器多新增一個 NameNode Stantby

執行前,請先確認叢集 hdfs 及 yarn 均已停止服務 !!

1. 新增hdfs-site.xml檔,並SCP到其他台電腦

SCP使用範例:

1
scp /usr/local/hadoop/etc/hadoop/hdfs-site.xml hadoop@test31:/usr/local/hadoop/etc/hadoop
1
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

https://i.imgur.com/GNS3rEL.png

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<property>
 <name>dfs.nameservices</name>
 <value>nncluster</value>
</property>
<property>
 <name>dfs.ha.namenodes.nncluster</name>
 <value>nn1,nn2</value>
</property>
<property>
 <name>dfs.namenode.rpc-address.nncluster.nn1</name>
 <value>test30.example.org:8020</value>
</property>
<property>
 <name>dfs.namenode.http-address.nncluster.nn1</name>
 <value>test30.example.org:9870</value>
</property>
<property>
 <name>dfs.namenode.rpc-address.nncluster.nn2</name>
 <value>test31.example.org:8020</value>
</property>
<property>
 <name>dfs.namenode.http-address.nncluster.nn2</name>
 <value>test31.example.org:9870</value>
</property>
<property>
 <name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://test32.example.org:8485;test33.example.org:8485;test34.example.org:8485/nncluster</value>
</property>
<property>
 <name>dfs.journalnode.edits.dir</name>
 <value>/home/hadoop/journalnode</value>
</property>
<property>
 <name>dfs.ha.fencing.methods</name>
 <value>shell(/bin/true)</value>
</property>
  <property>
 <name>dfs.client.failover.proxy.provider.nncluster</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

https://i.imgur.com/aCEx6Q9.png

2. 更正core-site.xml檔,並SCP到其他台電腦

1
nano /usr/local/hadoop/etc/hadoop/core-site.xml

https://i.imgur.com/Azjkjts.png

1
2
3
4
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://nncluster</value>
</property>

3. 三台電腦(Journalnodes)建立journalnode資料夾

1
mkdir ~/journalnode

4. 啟動journalnode,並jps確認

1
hdfs --daemon start journalnode

https://i.imgur.com/b1kSI1P.png

5. active NameNode限定

1
hdfs namenode -initializeSharedEdits

https://i.imgur.com/xW5B5gC.png

https://i.imgur.com/OXOt7UN.png

Success
請確認有出現Sucessfully started new epoch 1
Warning

如果此叢集是全新未使用過的請先 format !!!!!

1
hdfs namenode -format

6. 啟動第一台NameNode

1
hdfs --daemon start namenode

https://i.imgur.com/xdJyQFV.png

7. 第二台NameNode複製metadata

1
hdfs namenode -bootstrapStandby

https://i.imgur.com/0l6kVnO.png

https://i.imgur.com/O1vrqwg.png

Success
請確認有出現has been successfully formatted

8. 啟動第二台NameNode

1
hdfs --daemon start namenode

https://i.imgur.com/3KeeCTa.png

9. 停止全部NameNode再啟動

1
2
stop-dfs.sh
start-dfs.sh

https://i.imgur.com/MMinzyz.png

Tip
兩台namenode及三台journal node均會一起停止及啟動

10. 激活第一台NameNode,並檢查狀態

1
2
3
hdfs haadmin -transitionToActive nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

https://i.imgur.com/v1YsD2J.png

https://i.imgur.com/D4V8QSD.png

https://i.imgur.com/UpHbVKb.png

11. 啟動Yarn

1
start-yarn.sh

https://i.imgur.com/xWkBrLl.png

12. 啟動Job history server

1
mapred --daemon start historyserver

https://i.imgur.com/TVU2ucE.png

13. 切換一下active Namenode

1
2
3
4
5
hdfs haadmin -transitionToStandby nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
hdfs haadmin -transitionToActive nn2
hdfs haadmin -getServiceState nn2

https://i.imgur.com/aJ9SloD.png

https://i.imgur.com/KbRR8DB.png

https://i.imgur.com/wuY18oT.png

14. 跑個PI測試一下新起Namenode能不能正常運作

1
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 30 100

https://i.imgur.com/nCIQeQS.png

https://i.imgur.com/kS10yNp.png

15. 下載ZooKeeper並安裝(三台Zookeeper電腦都要)(管理者身份)

  1. 下載ZooKeeper
1
wget http://ftp.tc.edu.tw/pub/Apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
Info
如果載點失效,請至Apache Zookeeper 官網下載
  1. 解壓縮
1
tar -xvf apache-zookeeper-3.5.6-bin.tar.gz -C /usr/local
  1. 更名
1
mv /usr/local/apache-zookeeper-3.5.6-bin /usr/local/zookeeper
  1. 修改擁有者
1
chown -R hadoop:hadoop /usr/local/zookeeper

16. 複製zoo_sample.cfg並編輯zoo.cfg(使用SCP到另外兩台(worker))

1
2
cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
nano /usr/local/zookeeper/conf/zoo.cfg

https://i.imgur.com/8xmNuIt.png

https://i.imgur.com/aOTGRb7.png

1
2
3
4
5
dataDir=/usr/local/zookeeper/zoodata #修改
admin.serverPort=8010 #新增
server.1=test32.example.org:2888:3888 #新增
server.2=test33.example.org:2888:3888 #新增
server.3=test34.example.org:2888:3888 #新增

17. 修改zkEnv.sh檔(使用SCP到另外兩台(worker))

1
nano /usr/local/zookeeper/bin/zkEnv.sh

https://i.imgur.com/ZOYfch7.png

1
2
3
#新增
ZOO_LOG_DIR="/usr/local/zookeeper/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"  

18. 建立存放LOG資料夾

1
2
3
4
mkdir /usr/local/zookeeper/zoodata
echo "1" > /usr/local/zookeeper/zoodata/myid #第一台zookeeper做
echo "2" > /usr/local/zookeeper/zoodata/myid #第二台zookeeper做
echo "3" > /usr/local/zookeeper/zoodata/myid #第三台zookeeper做
  • myid請務必要與zoo.cfg設定一樣

https://i.imgur.com/AWlnjAS.png

19. 修改環境變數

  1. 編輯.bashrc
1
nano ~/.bashrc
  1. 新增環境變數
1
2
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin  

https://i.imgur.com/JfKYvX2.png

  1. 載入環境變數
1
source ~/.bashrc # . ~/.bashrc

20. 啟動ZooKeeper(三台電腦均要啟動)

1
2
3
zkServer.sh start
zkServer.sh status
jps

https://i.imgur.com/8U2BwW6.png

Info
*只有一台啟動時候,查看狀態會說It is probably not running.代表目前沒有其他zookeeper溝通*

21. 依序停止下列服務

1
2
3
4
5
6
#停止Historyserver 
mapred --daemon stop historyserver
#停止ResoureManager
stop-yarn.sh 
#停止NameNode
stop-dfs.sh

22. 新增hdfs-site.xml,並SCP到其他台電腦

1
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

https://i.imgur.com/Tpppyzd.png

1
2
3
4
5
<!--新增 -->
<property>
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
</property>

23. 新增core-site.xml,並SCP到其他台電腦

1
nano /usr/local/hadoop/etc/hadoop/core-site.xml

https://i.imgur.com/RwrXFIk.png

1
2
3
4
5
<!--新增 -->
<property>
 <name>ha.zookeeper.quorum</name>
 <value>master1.example.org:2181,master2.example.org:2181,master3.example.org:2181</value>
</property>

24. NameNode限定

Danger
只能在啟動 NameNode 上的機器執行 !!
1
hdfs zkfc -formatZK

https://i.imgur.com/c3X1cZa.png

請確認出現Successfully created /hadoop-ha/nncluster in ZK字樣

25. 啟動NameNode(NameNode限定)

1
start-dfs.sh

https://i.imgur.com/PXk8Lwq.png

Tip
系統將會自動啟動DFSZKFailoverController服務

26. 測試NameNode故障自動轉移(NameNode限定)

1
2
3
hdfs --daemon stop namenode
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2

https://i.imgur.com/Y6H2Ts4.png

27. 新增及刪除yarn-site.xml,並SCP到其他台電腦

https://i.imgur.com/2542fsi.png

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
<!--刪除property -->

<property>
<name>yarn.resourcemanager.hostname</name>
<value>test31.example.org</value>
</property>

<!--新增property -->

<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>rmcluster</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>test31.example.org</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>test30.example.org</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>test31.example.org:8088</value>
</property>
<property>
 <name>yarn.resourcemanager.webapp.address.rm2</name>
 <value>test30.example.org:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.store.class</name>	         
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>test32.example.org:2181,test33.example.org:2181,test34.example.org:2181</value>
</property>

28. 依序啟動下列服務

1
2
3
4
#啟動ResoureManager
start-yarn.sh 
#啟動Historyserver 
mapred --daemon start historyserver

29. 測試ResourceManager故障自動轉移(Resourcemanager限定)

1
2
3
yarn --daemon stop resourcemanager
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2

https://i.imgur.com/FkLwtmR.png


Warning

如果有修改Spark-defaults.conf運行程式載入Jar檔,請記得修訂

1
nano /usr/local/spark/conf/spark-defaults.conf

https://i.imgur.com/JLblhkX.png

附錄 - 服務開啟及關閉

  1. Hadoop 服務開啟 (啟用 HA)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#ZooKeeper啟動
zkServer.sh start

#NameNode啟動
start-dfs.sh

#ResourceManager啟動
start-yarn.sh

#Historyserver啟動
mapred --daemon start historyserver
  1. Hadoop 服務關閉 (啟用 HA)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#Historyserver停止
mapred --daemon stop historyserver

#ResourceManager停止
stop-yarn.sh

#NameNode停止
stop-dfs.sh


#ZooKeeper停止
zkServer.sh stop


如果你還沒有註冊 Like Coin,你可以在文章最下方看到 Like 的按鈕,點下去後即可申請帳號,透過申請帳號後可以幫我的文章按下 Like,而 Like 最多可以點五次,而你不用付出任何一塊錢,就能給我寫這篇文章的最大的回饋!