Tuesday, February 14, 2017

Decommissioning Slave Nodes

Decommissioning Slave Nodes

Hadoop provides the decommission feature to retire a set of existing slave nodes (DataNodes, NodeManagers, or HBase RegionServers) in order to prevent data loss. Slaves nodes are frequently decommissioned for maintainance. As a Hadoop administrator, you will decommission the slave nodes periodically in order to either reduce the cluster size or to gracefully remove dying nodes.

Prerequisites
• Ensure that the following property is defined in your hdfs-site.xml file.
<property>
<name>dfs.hosts.exclude</name>
<value><HADOOP_CONF_DIR>/dfs.exclude</value>
<final>true</final>
</property>

where <HADOOP_CONF_DIR> is the directory for storing the Hadoop configuration files.
For example, /etc/hadoop/conf.

• Ensure that the following property is defined in your yarn-site.xml file.

<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value><HADOOP_CONF_DIR>/yarn.exclude</value>
<final>true</final>
</property>


where <HADOOP_CONF_DIR> is the directory for storing the Hadoop configuration files.
For example, /etc/hadoop/conf.

Decommission DataNodes or NodeManagers

Nodes normally run both a DataNode and a NodeManager, and both are typically commissioned or decommissioned together.

With the replication level set to three, HDFS is resilient to individual DataNodes failures. However, there is a high chance of data loss when you terminate DataNodes without decommissioning them first. Nodes must be decommissioned on a schedule that permits replication of blocks being decommissioned.

On the other hand, if a NodeManager is shut down, the ResourceManager will reschedule the tasks on other nodes in the cluster. However, decommissioning a NodeManager may be required in situations where you want a NodeManager to stop to accepting new tasks, or when the tasks take time to execute but you still want to be agile in your cluster management.


Decommission DataNodes

Use the following instructions to decommission DataNodes in your cluster:

• On the NameNode host machine, edit the <HADOOP_CONF_DIR>/dfs.exclude file and add the list of DataNodes hostnames (separated by a newline character). where <HADOOP_CONF_DIR> is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

• Update the NameNode with the new set of excluded DataNodes. On the NameNode host machine, execute the following command:

su <HDFS_USER>
hdfs dfsadmin -refreshNodes


where <HDFS_USER> is the user owning the HDFS services. For example, hdfs.

• Open the NameNode web UI (http://<NameNode_FQDN>:50070) and navigate to the DataNodes page. Check to see whether the state has changed to Decommission In Progress for the DataNodes being decommissioned.

• When all the DataNodes report their state as Decommissioned (on the DataNodes page, or on the Decommissioned Nodes page at http://<NameNode_FQDN>:8088/cluster/ nodes/decommissioned), all of the blocks have been replicated. You can then shut down the decommissioned nodes.

• If your cluster utilizes a dfs.include file, remove the decommissioned nodes from the <HADOOP_CONF_DIR>/dfs.include file on the NameNode host machine, then execute the following command:

su <HDFS_USER>
hdfs dfsadmin -refreshNodes


Note:  If no dfs.include file is specified, all DataNodes are considered to be included in the cluster (unless excluded in the dfs.exclude file). The #dfs.hosts and dfs.hosts.exclude properties in hdfs-site.xml are used to specify the dfs.include and dfs.exclude files.

Decommission NodeManagers

Use the following instructions to decommission NodeManagers in your cluster:

• On the NameNode host machine, edit the <HADOOP_CONF_DIR>/yarn.exclude file
and add the list of NodeManager hostnames (separated by a newline character).
where <HADOOP_CONF_DIR> is the directory for storing the Hadoop configuration files.
For example, /etc/hadoop/conf.

• If your cluster utilizes a yarn.include file, remove the decommissioned nodes from
the <HADOOP_CONF_DIR>/yarn.include file on the ResourceManager host machine.

Note:
If no yarn.include file is specified, all NodeManagers are considered to be included in the cluster (unless excluded in the yarn.exclude file). The yarn.resourcemanager.nodes.include-path and
yarn.resourcemanager.nodes.exclude-path properties in yarnsite.xml are used to specify the yarn.include and yarn.exclude
files.

• Update the ResourceManager with the new set of NodeManagers. On the ResourceManager host machine, execute the following command:

su <YARN_USER>
yarn rmadmin -refreshNodes
where <YARN_USER> is the user who owns the YARN services, for example, yarn.

Decommission HBase RegionServers

Use the following instruction to decommission HBase RegionServers in your cluster. At the RegionServer that you want to decommission, execute:

su <HBASE_USER>
/usr/hdp/current/hbase-client/bin/hbase-daemon.sh stop


where <HBASE_USER> is the user who owns the HBase Services. For example, hbase.

RegionServer closes all the regions, then shuts down.

2 comments:

  1. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Hadoop Administartion Online Training Bnagalore

    ReplyDelete
  2. Very nice article,Thank you for sharing this Blog.
    Keep Updating....

    Big Data and Hadoop Online Training

    ReplyDelete

Kafka Architecture

Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you t...