Hadoop Configuration Files
1. hadoop-env.sh
This file specifies environment variables that affect the JDK used by Hadoop Daemon (bin/hadoop).
As Hadoop framework is written in Java and uses Java Runtime environment, one of the important environment variables for Hadoop daemon is $JAVA_HOME in hadoop-env.sh. This variable directs Hadoop daemon to the Java path in the system.
This file is also used for setting another Hadoop daemon execution environment such as heap size (HADOOP_HEAP), hadoop home (HADOOP_HOME), log file location (HADOOP_LOG_DIR), etc.
Note: For the simplicity of understanding the cluster setup, we have configured only necessary parameters to start a cluster.
2. core-site.sh
This file informs Hadoop daemon where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.
Name node, Hadoop daemon, Configuration settings,Hadoop Core
Where hostname and port are the machine and port on which NameNode daemon runs and listens. It also informs the Name Node as to which IP and port it should bind. The commonly used port is 8020 and you can also specify IP address rather than hostname.
3. hdfs-site.sh
This file contains the configuration settings for HDFS daemons; the Name Node, the Secondary Name Node, and the data nodes.
You can also configure hdfs-site.xml to specify default block replication and permission checking on HDFS. The actual number of replications can also be specified when the file is created. The default is used if replication is not specified in create time.
The value “true” for property ‘dfs.permissions’ enables permission checking in HDFS and the value “false” turns off the permission checking. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
HDFS daemons; the Name Node, the Secondary Name Node, and the data nodes.
4.mapred-site.sh
This file contains the configuration settings for MapReduce daemons; the job tracker and the task-trackers. The mapred.job.tracker parameter is a hostname (or IP address) and port pair on which the Job Tracker listens for RPC communication. This parameter specify the location of the Job Tracker to Task Trackers and MapReduce clients.
MapReduce daemons; the job tracker and the task-trackers
You can replicate all of the four files explained above to all the Data Nodes and Secondary Namenode. These files can then be configured for any node specific configuration e.g. in case of a different JAVA HOME on one of the Datanodes.
5.Masters
This file informs about the Secondary Namenode location to hadoop daemon. The ‘masters’ file at Master server contains a hostname Secondary Name Node servers.
Secondary Namenode location, hadoop daemon
6.Slaves
The ‘slaves’ file at Master node contains a list of hosts, one per line, that are to host Data Node and Task Tracker servers.
Slaves file, Master node, Hadoop
The ‘slaves’ file on Slave server contains the IP address of the slave node. Notice that the ‘slaves’ file at Slave node contains only its own IP address and not of any other Data Nodes in the cluster.
1. hadoop-env.sh
This file specifies environment variables that affect the JDK used by Hadoop Daemon (bin/hadoop).
As Hadoop framework is written in Java and uses Java Runtime environment, one of the important environment variables for Hadoop daemon is $JAVA_HOME in hadoop-env.sh. This variable directs Hadoop daemon to the Java path in the system.
This file is also used for setting another Hadoop daemon execution environment such as heap size (HADOOP_HEAP), hadoop home (HADOOP_HOME), log file location (HADOOP_LOG_DIR), etc.
Note: For the simplicity of understanding the cluster setup, we have configured only necessary parameters to start a cluster.
2. core-site.sh
This file informs Hadoop daemon where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.
Name node, Hadoop daemon, Configuration settings,Hadoop Core
Where hostname and port are the machine and port on which NameNode daemon runs and listens. It also informs the Name Node as to which IP and port it should bind. The commonly used port is 8020 and you can also specify IP address rather than hostname.
3. hdfs-site.sh
This file contains the configuration settings for HDFS daemons; the Name Node, the Secondary Name Node, and the data nodes.
You can also configure hdfs-site.xml to specify default block replication and permission checking on HDFS. The actual number of replications can also be specified when the file is created. The default is used if replication is not specified in create time.
The value “true” for property ‘dfs.permissions’ enables permission checking in HDFS and the value “false” turns off the permission checking. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
HDFS daemons; the Name Node, the Secondary Name Node, and the data nodes.
4.mapred-site.sh
This file contains the configuration settings for MapReduce daemons; the job tracker and the task-trackers. The mapred.job.tracker parameter is a hostname (or IP address) and port pair on which the Job Tracker listens for RPC communication. This parameter specify the location of the Job Tracker to Task Trackers and MapReduce clients.
MapReduce daemons; the job tracker and the task-trackers
You can replicate all of the four files explained above to all the Data Nodes and Secondary Namenode. These files can then be configured for any node specific configuration e.g. in case of a different JAVA HOME on one of the Datanodes.
5.Masters
This file informs about the Secondary Namenode location to hadoop daemon. The ‘masters’ file at Master server contains a hostname Secondary Name Node servers.
Secondary Namenode location, hadoop daemon
6.Slaves
The ‘slaves’ file at Master node contains a list of hosts, one per line, that are to host Data Node and Task Tracker servers.
Slaves file, Master node, Hadoop
The ‘slaves’ file on Slave server contains the IP address of the slave node. Notice that the ‘slaves’ file at Slave node contains only its own IP address and not of any other Data Nodes in the cluster.
It is nice post useful information hadoop admin thank you for sharing the Hadoop Admin Online Course Hyderabd
ReplyDelete
ReplyDeleteVery Impressive Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Hadoop. I'm also a learner taken up Hadoop training and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Hadoop. Do check it out if you are interested to know more.:-https://www.youtube.com/watch?v=1jMR4cHBwZE
Interested to know the top 10 technologies of 2019? Watch this:https://www.youtube.com/watch?v=-y5Z2fmnp-o
ReplyDeleteHi, Rahul this is very decent and helpful article for us. Keep writing and sharing. Thanks Big Data Testing Classes
ReplyDeleteWhat a fantastic read on Big Data Hadoop Tutorial. This has helped me understand a lot in Big Data Hadoop Tutorial. Please keep sharing similar write ups on Big Data Hadoop Tutorial. Guys if you are keen to knw more on Big Data Hadoop Tutorial, must check this wonderful Big Data Hadoop tutorial and i'm sure you will enjoy learning on Big Data Hadoop Tutorial.https://www.youtube.com/watch?v=nuPp-TiEeeQ&
ReplyDeleteThanks For Sharing The Information The Information Shared Is Very Valuable Please Keep UpdatingBig Data Testing Classes
ReplyDelete