前期准备
1,每台主机均已上传并解压相关软件,并安装jdk 1.6版本以上,具体方法可以参照上篇文章。
2,配置ssh互通,本质就是把本机的.ssh/id_rsa.pub文件传输到本机和远程主机.ssh/authorized_keys中
2.1 配置从master到其它主机无密码登录,理论上只设置此步骤即可
[hadoop@linux1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
24:be:da:90:5e:e3:ff:be:d1:4a:ce:f0:3c:55:01:3b hadoop@linux1
[hadoop@linux1 ~]$ cd .ssh
[hadoop@linux1 .ssh]$ ls
authorized_keys id_dsa id_dsa.pub id_rsa id_rsa.pub known_hosts
[hadoop@linux1 .ssh]$ cp id_rsa.pub authorized_keys
[hadoop@linux1 .ssh]$ ssh linux1
The authenticity of host 'linux1 (172.16.251.11)' can't be established.
RSA key fingerprint is ed:1a:0b:46:f2:08:75:c6:e5:05:25:d0:7b:25:c6:61.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'linux1,172.16.251.11' (RSA) to the list of known hosts.
Last login: Mon Dec 17 09:21:37 2012 from dtydb6
scp authorized_keys linux2:/home/hadoop/.ssh/
通过以上配置,从linux1 ssh登录linux2、linux3就不再提示输入密码了
2.2 配置从其它主机登录,以下命令分别在linux2和linux3执行
2.2.1 从ssh-keygen -t rsa,生成id_rsa.pub文件
2.2.2 scp id_rsa.pub linux1:/home/hadoop/.ssh/id_rsa.pub_linux2
2.2.3 cat id_rsa.pub_linux2 >> authorized_keys
cat id_rsa.pub_linux3 >> authorized_keys
2.2.4 scp到其它所有主机
scp authorized_keys linux2:/home/hadoop/.ssh/
scp authorized_keys linux3:/home/hadoop/.ssh/
2.3 验证ssh互通是否配置完成
ssh linux2 date
3,安装hadoop
tar -zxvf hadoop-1.0.4.tar.gz
设置环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_07
PATH=$PATH:$HOME/bin:/monitor/apache-flume-1.2.0/bin:/hadoop/hadoop-1.0.4/bin
默认参数设置在src/core/core-default.xml, src/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.等相关目录,个性化的配置在conf目录下的相关文件
3.1 conf/hadoop-env.sh 配置hadoop相关进程的运行参数
设置JAVA_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_07
3.2 conf/core-site.xml 设置namenode的URI访问地址
[hadoop@linux1 hadoop-1.0.4]$ vi conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://linux1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-1.0.4/var</value>
</property>
</configuration>
3.3 JobTracker相关配置信息
vi conf/mapred-site.xml
[hadoop@linux1 hadoop-1.0.4]$ vi conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>linux1:9001</value>
</property>
</configuration>
3.4 hdfs配置信息,主要配置name和data的存放路径
[hadoop@linux1 hadoop-1.0.4]$ vi conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/name1, /home/hadoop/name2</value>
<description> </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data1, /home/hadoop/data2</value>
<description> </description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
3.4 配置master和slaves文件
[hadoop@linux1 conf]$ vi masters
linux1
[hadoop@linux1 conf]$ vi slaves
linux2
linux3
3.5 把配置好的配置文件,软件分发到其他主机
[hadoop@linux1 conf]$ scp * linux3:/home/hadoop/hadoop-1.0.4/conf
或者hadoop整体打包到其它主机
4,运行hadoop4.1 format
"conf/hdfs-site.xml" 21L, 566C written
[hadoop@linux1 hadoop-1.0.4]$ hadoop namenode -format
12/12/17 10:52:19 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = linux1/172.16.251.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
12/12/17 10:52:19 INFO util.GSet: VM type = 64-bit
12/12/17 10:52:19 INFO util.GSet: 2% max memory = 19.33375 MB
12/12/17 10:52:19 INFO util.GSet: capacity = 2^21 = 2097152 entries
12/12/17 10:52:19 INFO util.GSet: recommended=2097152, actual=2097152
12/12/17 10:52:19 INFO namenode.FSNamesystem: fsOwner=hadoop
12/12/17 10:52:20 INFO namenode.FSNamesystem: supergroup=supergroup
12/12/17 10:52:20 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/12/17 10:52:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/12/17 10:52:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/12/17 10:52:20 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/12/17 10:52:20 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/12/17 10:52:20 INFO common.Storage: Storage directory /home/hadoop/name1 has been successfully formatted.
12/12/17 10:52:20 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/12/17 10:52:20 INFO common.Storage: Storage directory /home/hadoop/name2 has been successfully formatted.
12/12/17 10:52:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at linux1/172.16.251.11
************************************************************/
[hadoop@linux1 hadoop-1.0.4]$ cd /home/hadoop/name1
[hadoop@linux1 name1]$ ls -lrt
total 8
drwxrwxr-x 2 hadoop hadoop 4096 Dec 17 10:52 image
drwxrwxr-x 2 hadoop hadoop 4096 Dec 17 10:52 current
4.2 启动hadoop
[hadoop@linux1 hadoop-1.0.4]$ bin/start-all.sh
[hadoop@linux1 hadoop-1.0.4]$ bin/start-all.sh
starting namenode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-namenode-linux1.out
linux3: starting datanode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-linux3.out
linux2: starting datanode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-linux2.out
linux1: starting secondarynamenode, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-secondarynamenode-linux1.out
starting jobtracker, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-jobtracker-linux1.out
linux2: starting tasktracker, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-linux2.out
linux3: starting tasktracker, logging to /home/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-linux3.out
也可分开启动,分别执行start-dfs.sh,和start-mapred.sh
4.3 启动后,使用jps命令查看进程情况
[hadoop@linux1 logs]$ /usr/java/jdk1.7.0_07/bin/jps
17152 NameNode
17772 Jps
17315 SecondaryNameNode
17408 JobTracker
[hadoop@linux2 hadoop-1.0.4]$ /usr/java/jdk1.7.0_07/bin/jps
14355 DataNode
14867 Jps
14446 TaskTracker
也可使用web浏览器查看
参考资料
http://blog.csdn.net/hguisu/article/details/7237395
http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html
分享到:
相关推荐
Hadoop集群安装的详细说明文档, 實作七: Hadoop 叢集安裝 前言 您手邊有兩台電腦,假設剛剛操作的電腦為"主機一" ,另一台則為"主機二" 。則稍後的環境如下 • 管理Data的身份 管理Job的身份 "主機一" namenode ...
Hadoop在centOS系统下的安装文档,系统是虚拟机上做出来的,一个namenode,两个datanode,详细讲解了安装过程。
单机和集群安装相差不多,先说单机然后补充集群的几点配置。 第一步,先安装工具软件编辑器:vim 代码如下:sudo apt-get install vimssh服务器: openssh,先安装ssh是为了使用远程终端工具(putty或xshell等),这样...
“ copy”角色将复制hadoop和jdk的安装包,“ install”角色将在实例上安装这些rpm包,“ masterconfig”将创建一个目录格式化它并在主实例上启动namenode服务,“ slaveconfig”将创建一个目录并在从属实例上启动...
为什么选择Python它是开源的它是 Python 2.7,最新的 2.* 版本它包括 Numpy、IPython 和许多其他很酷的东西,为您的架构预先配置和预编译一个用例假设您有一个 Hadoop 集群,带有 CentOS 6.* 机器。 (因为 CentOS 6...
p python3 .envsource .env/bin/activatepip install -r requirements.txt构建Docker映像./bin/build.sh slave./bin/build.sh master./bin/build.sh zoo./bin/build.sh network运行Docker容器./bin/start.sh slave./...
网站问题追踪器建造可以使用Maven构建Jumbune 为Yarn Hadoop集群构建mvn clean install -P yarn 为非Yarn Hadoop集群构建mvn clean install 安装可以在上找到详细的安装指南。 部署规划指南,为 部署Jumbune java -...
Customize the HDInsight cluster and install additional Hadoop ecosystem projects using Script Actions Administering HDInsight from the Hadoop command prompt or Microsoft PowerShell Using the Microsoft...
How to install Kubernetes on a multi-node cluster How to set environment variables How to create a multi-container pods using Docker How to use volumes How to use Kubernetes with Apache Hadoop ...
how to install and build Kafka 0.8.x using different versions of Scala. Chapter 2, Setting Up a Kafka Cluster, describes the steps required to set up a single- or multi-broker Kafka cluster and shares...
See how Brooklyn can be used to install Mule ESB on a cluster and Cassandra in the cloud Install and use DCOS for big data processingUse Apache Spark for big data stack data processing Who This Book...
适用于Node.js的Google Cloud Dataproc API客户端 可以在找到每个版本中更改的全面列表。 在“客户端库介绍”中详细了解有关Cloud API的客户端库,包括较旧的Google API客户。...// cluster, submitting a PySp
To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as ...
用例:受欢迎的NASA航天...kafka-operatorkubectl apply -f https://farberg.de/talks/big-data/code/helm-kafka-operator/kafka-cluster-def.yaml 运行中的具有YARN的Hadoop集群(用于检查点) helm repo add stable ...
java8 看不到源码注意:此存储库已弃...Cluster,我们提供了一个 Conda 包,它会自动安装 OAP 所需的依赖项,您可以参考以获取更多信息。 完成后,您可以在$HOME/miniconda2/envs/oapenv/oap_jars下找到构建的spark-arr
To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as ...
Install and set up Spark on your cluster Work effectively with large data sets Approach This book will be a basic, step-by-step tutorial, which will help readers take advantage of all that Spark has ...