How to decommission a data node

I realized running 6 nodes on 2 laptops is kind of overstretching. Some data nodes died when run heavy jobs like Spark Machine Learning, which can consume significant amount of memory and CPU.

So I am going to reduce the number of nodes to 4, and give each more memory (4GB -> 8GB).

Step 1: Label hosts to be decommissioned

Check if exclude files are configured. Here you will get an error message because we have not configured it.

[hadoop@nnode1 ~]$ hdfs getconf -excludeFile
Configuration dfs.hosts.exclude is missing.

If so, we will modify $HADOOP_CONF_DIR/hdfs-site.xml, and add following entry. I put the to-be decommissioned nodes in a file on the master name node. The file name is /home/hadoop/hdfs.exclude.list.


Step 2: Tell Hadoop to relocate data blocks

Once checked the configuration, do a node refresh. The node refresh process runs in background. And You should get a message back quickly indicating the process has started. To relocate the data blocks away from the being decommissioned nodes will take some time. However, when you check the status you should see the node status changed to “Decommission in progress“.

[hadoop@nnode1 ~]$ hdfs getconf -excludeFile
[hadoop@nnode1 ~]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
[hadoop@nnode1 ~]$ hdfs dfsadmin -report

On nnode1 (master name node), check the name node log file. And you will see blocks are being relocated.

tail -f hadoop-hadoop-namenode-nnode1.log

When the whole process is done, you should get a message like this:

2017-02-09 10:35:14,106 INFO org.apache.hadoop.hdfs.server.blockmanagement.DecommissionManager: Decommissioning complete for node
2017-02-09 10:35:14,106 INFO org.apache.hadoop.hdfs.server.blockmanagement.DecommissionManager: Checked 596 blocks and 1 nodes this tick

Now, if you run HDFS report again, you will see the node status changed to “Decommissioned

[hadoop@nnode1 ~]$ hdfs dfsadmin -report
Name: (dnode4)
Hostname: dnode4
Decommission Status : Decommissioned

Step 3: Permanently remove the hosts from slaves file

If you will not put the decommissioned node back to work, you will permanent delete it from slaves file. That way, Hadoop will not try to start the data node.