Hadoop 為 Apache 基金會的開源頂級專案,為軟體框架做為分散式儲存及運算,無論是增減加機器都能處理,另具備高可用性、數據副本等能力
機器基本訊息:
- 準備五台機器 (兩台主節點、三台工作節點)
IP |
FQDN |
HOSTNAME |
用途 |
192.168.1.30 |
test30.example.org |
test30 |
Master 節點 (Namenode) |
192.168.1.31 |
test31.example.org |
test31 |
Master 節點 (ResourceManager) |
192.168.1.32 |
test32.example.org |
test32 |
Worker 節點 |
192.168.1.33 |
test33.example.org |
test33 |
Worker 節點 |
192.168.1.34 |
test34.example.org |
test34 |
Worker 節點 |
-
OS : Ubuntu 18.04
-
資源配置 :
- Cpu : 4 core
- Ram : 8 G
- Disk : 50 G
建置步驟 - 高可用性配置:
Note
新增節點功能:
1. 三台 Worker 機器當作 Journalnodes及ZooKeeper
2. 原本的 NameNode 機器多新增一個 ResourceManager Stantby
3. 原本的 ResourceManager 機器多新增一個 NameNode Stantby
執行前,請先確認叢集 hdfs 及 yarn 均已停止服務 !!
1. 新增hdfs-site.xml檔,並SCP到其他台電腦
SCP使用範例:
1
|
scp /usr/local/hadoop/etc/hadoop/hdfs-site.xml hadoop@test31:/usr/local/hadoop/etc/hadoop
|
1
|
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
<property>
<name>dfs.nameservices</name>
<value>nncluster</value>
</property>
<property>
<name>dfs.ha.namenodes.nncluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nncluster.nn1</name>
<value>test30.example.org:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nncluster.nn1</name>
<value>test30.example.org:9870</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nncluster.nn2</name>
<value>test31.example.org:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nncluster.nn2</name>
<value>test31.example.org:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://test32.example.org:8485;test33.example.org:8485;test34.example.org:8485/nncluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journalnode</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nncluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
|
2. 更正core-site.xml檔,並SCP到其他台電腦
1
|
nano /usr/local/hadoop/etc/hadoop/core-site.xml
|
1
2
3
4
|
<property>
<name>fs.defaultFS</name>
<value>hdfs://nncluster</value>
</property>
|
3. 三台電腦(Journalnodes)建立journalnode資料夾
4. 啟動journalnode,並jps確認
1
|
hdfs --daemon start journalnode
|
5. active NameNode限定
1
|
hdfs namenode -initializeSharedEdits
|
Success
請確認有出現Sucessfully started new epoch 1
Warning
如果此叢集是全新未使用過的請先 format !!!!!
6. 啟動第一台NameNode
1
|
hdfs --daemon start namenode
|
1
|
hdfs namenode -bootstrapStandby
|
Success
請確認有出現has been successfully formatted
8. 啟動第二台NameNode
1
|
hdfs --daemon start namenode
|
9. 停止全部NameNode再啟動
1
2
|
stop-dfs.sh
start-dfs.sh
|
Tip
兩台namenode及三台journal node均會一起停止及啟動
10. 激活第一台NameNode,並檢查狀態
1
2
3
|
hdfs haadmin -transitionToActive nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
|
11. 啟動Yarn
12. 啟動Job history server
1
|
mapred --daemon start historyserver
|
13. 切換一下active Namenode
1
2
3
4
5
|
hdfs haadmin -transitionToStandby nn1
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
hdfs haadmin -transitionToActive nn2
hdfs haadmin -getServiceState nn2
|
14. 跑個PI測試一下新起Namenode能不能正常運作
1
|
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar pi 30 100
|
15. 下載ZooKeeper並安裝(三台Zookeeper電腦都要)(管理者身份)
- 下載ZooKeeper
1
|
wget http://ftp.tc.edu.tw/pub/Apache/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz
|
- 解壓縮
1
|
tar -xvf apache-zookeeper-3.5.6-bin.tar.gz -C /usr/local
|
- 更名
1
|
mv /usr/local/apache-zookeeper-3.5.6-bin /usr/local/zookeeper
|
- 修改擁有者
1
|
chown -R hadoop:hadoop /usr/local/zookeeper
|
16. 複製zoo_sample.cfg並編輯zoo.cfg(使用SCP到另外兩台(worker))
1
2
|
cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
nano /usr/local/zookeeper/conf/zoo.cfg
|
1
2
3
4
5
|
dataDir=/usr/local/zookeeper/zoodata #修改
admin.serverPort=8010 #新增
server.1=test32.example.org:2888:3888 #新增
server.2=test33.example.org:2888:3888 #新增
server.3=test34.example.org:2888:3888 #新增
|
17. 修改zkEnv.sh檔(使用SCP到另外兩台(worker))
1
|
nano /usr/local/zookeeper/bin/zkEnv.sh
|
1
2
3
|
#新增
ZOO_LOG_DIR="/usr/local/zookeeper/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
|
18. 建立存放LOG資料夾
1
2
3
4
|
mkdir /usr/local/zookeeper/zoodata
echo "1" > /usr/local/zookeeper/zoodata/myid #第一台zookeeper做
echo "2" > /usr/local/zookeeper/zoodata/myid #第二台zookeeper做
echo "3" > /usr/local/zookeeper/zoodata/myid #第三台zookeeper做
|
19. 修改環境變數
- 編輯.bashrc
- 新增環境變數
1
2
|
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
|
- 載入環境變數
1
|
source ~/.bashrc # . ~/.bashrc
|
20. 啟動ZooKeeper(三台電腦均要啟動)
1
2
3
|
zkServer.sh start
zkServer.sh status
jps
|
Info
*只有一台啟動時候,查看狀態會說It is probably not running.代表目前沒有其他zookeeper溝通*
21. 依序停止下列服務
1
2
3
4
5
6
|
#停止Historyserver
mapred --daemon stop historyserver
#停止ResoureManager
stop-yarn.sh
#停止NameNode
stop-dfs.sh
|
22. 新增hdfs-site.xml,並SCP到其他台電腦
1
|
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
|
1
2
3
4
5
|
<!--新增 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
|
23. 新增core-site.xml,並SCP到其他台電腦
1
|
nano /usr/local/hadoop/etc/hadoop/core-site.xml
|
1
2
3
4
5
|
<!--新增 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>master1.example.org:2181,master2.example.org:2181,master3.example.org:2181</value>
</property>
|
24. NameNode限定
請確認出現Successfully created /hadoop-ha/nncluster in ZK字樣
25. 啟動NameNode(NameNode限定)
Tip
系統將會自動啟動DFSZKFailoverController服務
26. 測試NameNode故障自動轉移(NameNode限定)
1
2
3
|
hdfs --daemon stop namenode
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
|
27. 新增及刪除yarn-site.xml,並SCP到其他台電腦
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
<!--刪除property -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>test31.example.org</value>
</property>
<!--新增property -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rmcluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>test31.example.org</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>test30.example.org</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>test31.example.org:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>test30.example.org:8088</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>test32.example.org:2181,test33.example.org:2181,test34.example.org:2181</value>
</property>
|
28. 依序啟動下列服務
1
2
3
4
|
#啟動ResoureManager
start-yarn.sh
#啟動Historyserver
mapred --daemon start historyserver
|
29. 測試ResourceManager故障自動轉移(Resourcemanager限定)
1
2
3
|
yarn --daemon stop resourcemanager
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
|
Warning
如果有修改Spark-defaults.conf運行程式載入Jar檔,請記得修訂
1
|
nano /usr/local/spark/conf/spark-defaults.conf
|
附錄 - 服務開啟及關閉
- Hadoop 服務開啟 (啟用 HA)
1
2
3
4
5
6
7
8
9
10
11
|
#ZooKeeper啟動
zkServer.sh start
#NameNode啟動
start-dfs.sh
#ResourceManager啟動
start-yarn.sh
#Historyserver啟動
mapred --daemon start historyserver
|
- Hadoop 服務關閉 (啟用 HA)
1
2
3
4
5
6
7
8
9
10
11
12
|
#Historyserver停止
mapred --daemon stop historyserver
#ResourceManager停止
stop-yarn.sh
#NameNode停止
stop-dfs.sh
#ZooKeeper停止
zkServer.sh stop
|