Thursday, July 14, 2016

Installation of Hadoop 1.x

Installation of Hadoop 1.x

1. Configure the hostname
     vi /etc/hostname
2. Configure reverse DNS
     vi /etc/hosts
3. Create directory

mkdir /usr/java
mkdir /usr/hadoop
mkdir /usr/hadoop/data
mkdir /usr/hadoop/namenode
mkdir /usr/hadoop/tmp

4. Configure Java

5. Add group
groupadd hadoop
useradd -g hadoop hduser
passwd hduser

6. Extract Hadoop file

7. Give permission to Hadoop folder of hduser

chown -R hduser:hadoop /usr/hadoop

8. Password less SSH

ssh-keygen
cd .ssh/
cat id_rsa.pub >> authorized_keys

9. Configure .bashrc file

export HADOOP_HOME=/usr/hadoop/hadoop-1.2.1/
export PATH=$PATH:$HADOOP_HOME/bin


10. Configure Hadoop Configuration file.

hdfs-site.xml

<configuration>
<property>
    <name>dfs.data.dir</name>
    <value>/usr/hadoop/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
    <value>/usr/hadoop/namenode</value>
<final>true</final>
</property>
<property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</description>
    </property>
</configuration>


core-site.xml

<property>
        <name>fs.default.name</name>
        <value>hdfs://node1.hadoop.com:54310</value>
    </property>
        <property>
        <name>dfs.permission</name>
        <value>false</value>
    </property>
        <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/hadoop/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>

mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>node1.hadoop.com:54311</value>
<description>The host and port that the MapReduce job tracker runs at. Tf "local", then jobs are in-process as a single map and reduce task.</description>
</property>

Slave and master file : add the hostname into it.

11. Format the namenode

hadoop namenode -format -force

12. run the start-all.sh file.


No comments:

Post a Comment

Kafka Architecture

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...