`
sjk2013
  • 浏览: 2183925 次
文章分类
社区版块
存档分类
最新评论

install cluster hadoop 安装集群版hadoop

 
阅读更多
前期准备
1,每台主机均已上传并解压相关软件,并安装jdk 1.6版本以上,具体方法可以参照上篇文章。

2,配置ssh互通,本质就是把本机的.ssh/id_rsa.pub文件传输到本机和远程主机.ssh/authorized_keys中

2.1 配置从master到其它主机无密码登录,理论上只设置此步骤即可

[hadoop@linux1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
24:be:da:90:5e:e3:ff:be:d1:4a:ce:f0:3c:55:01:3b hadoop@linux1
[hadoop@linux1 ~]$ cd .ssh
[hadoop@linux1 .ssh]$ ls
authorized_keys id_dsa id_dsa.pub id_rsa id_rsa.pub known_hosts
[hadoop@linux1 .ssh]$ cp id_rsa.pub authorized_keys
[hadoop@linux1 .ssh]$ ssh linux1
The authenticity of host 'linux1 (172.16.251.11)' can't be established.
RSA key fingerprint is ed:1a:0b:46:f2:08:75:c6:e5:05:25:d0:7b:25:c6:61.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'linux1,172.16.251.11' (RSA) to the list of known hosts.
Last login: Mon Dec 17 09:21:37 2012 from dtydb6

scp authorized_keys linux2:/home/hadoop/.ssh/

通过以上配置,从linux1 ssh登录linux2、linux3就不再提示输入密码了
2.2 配置从其它主机登录,以下命令分别在linux2和linux3执行
2.2.1 从ssh-keygen -t rsa,生成id_rsa.pub文件
2.2.2 scp id_rsa.pub linux1:/home/hadoop/.ssh/id_rsa.pub_linux2
2.2.3 cat id_rsa.pub_linux2 >> authorized_keys
cat id_rsa.pub_linux3 >> authorized_keys
2.2.4 scp到其它所有主机
scp authorized_keys linux2:/home/hadoop/.ssh/
scp authorized_keys linux3:/home/hadoop/.ssh/
2.3 验证ssh互通是否配置完成
ssh linux2 date


3,安装hadoop
tar -zxvf hadoop-1.0.4.tar.gz
设置环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_07
PATH=$PATH:$HOME/bin:/monitor/apache-flume-1.2.0/bin:/hadoop/hadoop-1.0.4/bin
默认参数设置在src/core/core-default.xml, src/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.等相关目录,个性化的配置在conf目录下的相关文件
3.1 conf/hadoop-env.sh 配置hadoop相关进程的运行参数
设置JAVA_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_07
3.2 conf/core-site.xml 设置namenode的URI访问地址
[hadoop@linux1 hadoop-1.0.4]$ vi conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://linux1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-1.0.4/var</value>
</property>
</configuration>

3.3 JobTracker相关配置信息
vi conf/mapred-site.xml

[hadoop@linux1 hadoop-1.0.4]$ vi conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>linux1:9001</value>
</property>
</configuration>
3.4 hdfs配置信息,主要配置name和data的存放路径
[hadoop@linux1 hadoop-1.0.4]$ vi conf/hdfs-site.xml

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/name1, /home/hadoop/name2</value>
<description> </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data1, /home/hadoop/data2</value>
<description> </description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

3.4 配置master和slaves文件
[hadoop@linux1 conf]$ vi masters

linux1

[hadoop@linux1 conf]$ vi slaves

linux2
linux3

3.5 把配置好的配置文件,软件分发到其他主机
[hadoop@linux1 conf]$ scp * linux3:/home/hadoop/hadoop-1.0.4/conf
或者hadoop整体打包到其它主机


4,运行hadoop

4.1 format
"conf/hdfs-site.xml" 21L, 566C written
[hadoop@linux1 hadoop-1.0.4]$ hadoop namenode -format
12/12/17 10:52:19 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = linux1/172.16.251.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
12/12/17 10:52:19 INFO util.GSet: VM type = 64-bit
12/12/17 10:52:19 INFO util.GSet: 2% max memory = 19.33375 MB
12/12/17 10:52:19 INFO util.GSet: capacity = 2^21 = 2097152 entries
12/12/17 10:52:19 INFO util.GSet: recommended=2097152, actual=2097152
12/12/17 10:52:19 INFO namenode.FSNamesystem: fsOwner=hadoop
12/12/17 10:52:20 INFO namenode.FSNamesystem: supergroup=supergroup
12/12/17 10:52:20 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/12/17 10:52:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/12/17 10:52:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/12/17 10:52:20 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/12/17 10:52:20 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/12/17 10:52:20 INFO common.Storage: Storage directory /home/hadoop/name1 has been successfully formatted.
12/12/17 10:52:20 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/12/17 10:52:20 INFO common.Storage: Storage directory /home/hadoop/name2 has been successfully formatted.
12/12/17 10:52:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at linux1/172.16.251.11
************************************************************/
[hadoop@linux1 hadoop-1.0.4]$ cd /home/hadoop/name1
[hadoop@linux1 name1]$ ls -lrt
total 8
drwxrwxr-x 2 hadoop hadoop 4096 Dec 17 10:52 image
drwxrwxr-x 2 hadoop hadoop 4096 Dec 17 10:52 current
4.2 启动hadoop
[hadoop@linux1 hadoop-1.0.4]$ bin/start-all.sh
[hadoop@linux1 hadoop-1.0.4]$ bin/start-all.sh
starting namenode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-namenode-linux1.out
linux3: starting datanode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-linux3.out
linux2: starting datanode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-linux2.out
linux1: starting secondarynamenode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-secondarynamenode-linux1.out
starting jobtracker, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-jobtracker-linux1.out
linux2: starting tasktracker, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-linux2.out
linux3: starting tasktracker, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-linux3.out

也可分开启动,分别执行start-dfs.sh,和start-mapred.sh



4.3 启动后,使用jps命令查看进程情况
[hadoop@linux1 logs]$ /usr/java/jdk1.7.0_07/bin/jps
17152 NameNode
17772 Jps
17315 SecondaryNameNode
17408 JobTracker
[hadoop@linux2 hadoop-1.0.4]$ /usr/java/jdk1.7.0_07/bin/jps
14355 DataNode
14867 Jps
14446 TaskTracker

也可使用web浏览器查看
参考资料

http://blog.csdn.net/hguisu/article/details/7237395

http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html


分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics