Hadoop 為 Apache 基金會的開源頂級專案,為軟體框架做為分散式儲存及運算,無論是增減加機器都能處理,另具備高可用性、數據副本等能力
機器基本訊息:
- 準備五台機器 (兩台主節點、三台工作節點)
IP |
FQDN |
HOSTNAME |
用途 |
192.168.1.30 |
test30.example.org |
test30 |
Master 節點 (Namenode) |
192.168.1.31 |
test31.example.org |
test31 |
Master 節點 (ResourceManager) |
192.168.1.32 |
test32.example.org |
test32 |
Worker 節點 |
192.168.1.33 |
test33.example.org |
test33 |
Worker 節點 |
192.168.1.34 |
test34.example.org |
test34 |
Worker 節點 |
-
OS : Ubuntu 18.04
-
資源配置 :
- Cpu : 4 core
- Ram : 8 G
- Disk : 50 G
建置步驟 - 安裝及設定:
1. 下載及安裝hadoop(管理者身份)
- 下載
1
2
|
cd
wget http://ftp.tc.edu.tw/pub/Apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
|
- 解壓縮
1
2
|
tar -tvf hadoop-3.2.1.tar.gz #查看一下檔案內容
tar -xvf hadoop-3.2.1.tar.gz -C /usr/local
|
- 更名
1
|
mv /usr/local/hadoop-3.2.1 /usr/local/hadoop
|
- 改變資料夾及檔案擁有者
1
|
chown -R hadoop:hadoop /usr/local/hadoop
|
2. 設定hadoop使用者環境變數 (Hadoop身份)
- 設定.bashrc
1
2
3
4
5
6
|
# Set HADOOP_HOME
export HADOOP_HOME=/usr/local/hadoop
# Set HADOOP_MAPRED_HOME
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
# Add Hadoop bin and sbin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
|
- 重新載入設定檔
1
|
source ~/.bashrc # . .bashrc
|
- 查看環境變數
3. 更改 Hadoop運行程式時環境腳本(Hadoop身份)
1
|
nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
|
1
2
3
|
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
|
4. 更改 Hadoop core-site.xml(Hadoop身份)
1
|
nano /usr/local/hadoop/etc/hadoop/core-site.xml
|
1
2
3
4
5
6
7
8
9
10
|
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data</value>
<description>Temporary Directory.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://test30.example.org</value>
<description>Use HDFS as file storage engine</description>
</property>
|
Tip
Hadoop 3.2.0版之後有檢查語法指令
GitHub原始碼請點我
5. 更改 Hadoop mapred-site.xml(Hadoop身份)
1
|
nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3276m</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx3276m</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>819</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>test32.example.org:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>test32.example.org:19888</value>
</property>
|
6. 更改 Hadoop yarn-site.xml(Hadoop身份)
1
|
nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>6144</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>6144</value>
</property>
<property>
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>test31.example.org</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
|
7. 更改Hadoop hdfs-site.xml(Hadoop身份)
1
|
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
|
1
2
3
4
5
|
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
<description>The name of the group of super-users. The value should be a single group name.</description>
</property>
|
8. 建立Hadoop worker檔(管理者身份)
1
|
nano /usr/local/hadoop/etc/hadoop/workers
|