A Hadoop Cluster Installation and Configuration Tutorial

A Hadoop Cluster Installation and Configuration Tutorial

Recently, we started looking into using Big Data technologies to better resolve our customers’ BI/reporting needs, for both big data and small data applications. The first task was of course to setup a cluster. I decided to try a 6-node Hadoop Cluster so that I can get a real feeling how HDFS works and how it performs on really commodity hardware.

I have one 4 core 32GB Dell Precision and one 2 core 16GB Lenovo Yoga. Both have 2 hyper threads on each core. Hence I figured the Dell can probably support 4 virtual hosts with each 2 CPUs and the Lenovo can probably support 2 (virtual hosts). Both my laptops are running Windows 10. I was not very comfortable at the beginning until it all worked because of the concern over VM software stability on Windows 10.

Before we get started, it was worth mentioning that, while there are many Hadoop tutorials online, very few people bothered documenting the system setup processes in detail. That in my experience is the most time-consuming thing. Once you get the systems configured properly, the Hadoop installation and configuration will be down-the-hill. Well, maybe it depends on your technical background.

Also it is wise to really get the first host configured perfectly at the beginning. Once the first host is ready, you can clone out other hosts quickly. On the other hand, trying to make configuration changes on 6 hosts manually is very error-prone.

System Configuration

The objective is to create 4 Linux hosts on the Dell and 2 Linux hosts on the Lenovo. The first step is to download and install Oracle VM VirtualBox. You can try VMWare too, but VirtualBox worked very well for me. I installed version 5.1.14 on both laptops.

Next, download the Oracle Enterprise Linux ISO image. I tried version 6.8 before in my first experiment. Version 6.8 was packaged with some old version Python and GCC. And that caused a lot of troubleshooting in my later Python/R programming. So I decided to recreate it from scratch using OL7.3.

You can leave the ISO image on the hard disk. VirtualBox will create a virtual device out of the ISO file, and that is much faster than putting the ISO image on a physical DVD and reading from it for installation.

Step 1: create the first virtual machine

Name: nnode1 (this will be designated as the primary name node)
Type: Linux
Version: Oracle (64-bit)

You could see only 32-bit versions in the dropdown list if the hardware Virtual Technology is not enabled in BIOS. If so, you should quit and change BIOS configuration. Note, on UEFI enabled systems, like my Lenovo, the way to launch BIOS is to use advanced settings.

I gave each host 4GB memory and 2 CPUs. This setting can be adjusted after the virtual machine is created. Next, choose to “Create a virtual hard disk now”, and choose the type “VDI (Virtual Disk Image)” with “Fixed Size” of “50GB”. I would recommend use fixed size if you are like me working off laptops. If you have faster hard disks or you are using a decent AWS host, you can try dynamic size. If you are going to try other things like Cloudera, the virtual disk should be >= 40GB.

Next, you will need to change network adaptor settings. You must not use NAT in this case. For single node experiments, NAT works. But NAT allows only internet access. It doesn’t allow inter-cluster communication. Recommended settings are:

Attached to: Bridged Adaptor
Name: (choose your wired network connection, this should be your physical/real wired network adaptor)
Adaptor Type: Paravirtualized Network (virtio-net)
Promiscuous Mode: Allow All
MAC Address: (accept the dynamically generated value)
Cable Connected: (let it be checked if it is based on a wired network connection)

If you don’t have a physical wired network connection and have to use WIFI, then choose one of the simulated Intel network adaptor types like “PRO 1000 MT Desktop“.

I allocated 2 CPUs to each virtual host. You probably want to set a cap on the overall CPU usage if (# CPU per host) * (# hosts) is greater than (# available CPU).

Next, attach the downloaded Linux image, which I saved as C:\VirutalBoxInstall\OL73.iso to the virtual host.

Step 2: install Linux on the first virtual host

Now you can start the just-created virtual machine. That will launch the Oracle Linux 7.3 installer. Note the DVD image file is mounted on the virtual host as a DVD device. The default boot sequence starts from the DVD-rom.

Follow the Oracle Linux installation steps to finish the installation. Recommended settings are:

– Choose Server with GUI as installation type, add development tools packages and compatibility packages
– Enable Network Time Protocol (NTP)
– Disable KDump
– Enable and configure static IP v4 address or change network settings after installation

Post installation steps are:

– Add a user hadoop and a group dev
– Add hadoop user to the sudoer list
– Configure static IP v4 address, gateway, and DNS if is was not done during the installation
– Disable IP v6 protocol
– Enable all network traffic freely pass through eth0 interface (add interface eth0 to public zone if not so)
– Enable automatic network connection

The network configuration interface can be accessed via menu Applications -> System Tools -> Settings -> Netowrk. At the end, you should see content like following in /etc/sysconfig/network-scripts/ifcfg-eth0

[hadoop@nnode1 network-scripts]$ pwd
/etc/sysconfig/network-scripts
[hadoop@nnode1 network-scripts]$ cat ifcfg-eth0
TYPE=”Ethernet”
BOOTPROTO=none
DEFROUTE=”yes”
IPV4_FAILURE_FATAL=”no”
IPV6INIT=no
IPV6_AUTOCONF=”yes”
IPV6_DEFROUTE=”yes”
IPV6_FAILURE_FATAL=”no”
IPV6_ADDR_GEN_MODE=”stable-privacy”
NAME=”eth0″
UUID=”c5ca6c3c-d094-4f7c-93e1-22aa59e9c911″
DEVICE=”eth0″
ONBOOT=”yes”
ZONE=public
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
DNS1=192.168.0.1
IPADDR=192.168.0.221
PREFIX=24
GATEWAY=192.168.0.1
[hadoop@nnode1 network-scripts]$

Step 3: configure the first virtual host

Login to the new system nnode1 as hadoop.

Next, start a terminal session, and switch to root using command su –, and then create directory /opt/app and /opt/data. /opt/app will contain hadoop software and configuration files. And /opt/data will contain hadoop file system, including any temporary working folders. Although this is just a practice, it is always good to set the data from the software and configuration. In a production environment, you can manage to put these two folders on different physical storage. That will help manage redundancy and backup/recovery.

Next, add hadoop user to sudoer list by locating the line as shown in /etc/sudoers file, and then add the second line.

Next, add following to /etc/hosts on nnode1. IP addresses are to be entered as fit to your environment.

192.168.0.221 nnode1
192.168.0.222 nnode2
192.168.0.223 dnode1
192.168.0.224 dnode2
192.168.0.225 dnode3
192.168.0.226 dnode4

Next, download Hadoop 2.7.3 from here. For our purposes, you just need to download binaries. Download the file hadoop-2.7.3.tar.gz. And then move it to /opt/app folder. Then use following command to unzip the files and also change the owner to hadoop.

Start a terminal session and run following:

cd
cd Downloads
sudo mv hadoop-2.7.3.tar.zip /opt/app
cd /opt/app
sudo tar xvfz hadoop-2.7.3.tar.gz
sudo chown -R hadoop:dev /opt/app/hadoop-2.7.3

By now you have the software installed on the first node (nnode1). Next, let’s shutdown the nnode1 and clone it to other virtual hosts. Run following command in terminal to shutdown nnode1.

sudo init 0

Step 4: install VirtualBox guest additions

Optionally, you can install Oracle VirtualBox guess additions by following this document. This will make the Linux GUI more user friendly, but it is not essential for Hadoop configuration. If you decide to do this, then change storage setting in VirtualBox Manager for virtual machine nnode1, and choose VBoxGuestAdditions.iso ad optical drive. Do that while the virtual machine is powered off. Then start the virtual machine. The guest additions DVD should be mounted under /run/media/hadoop.

Start a terminal session, and switch to root, change to /run/media/hadoop/VBOXADDITIONS_5.1.4_112924, and then execute command ./VBoxLinuxAdditions.run.

Shutdown the virtual machine again.

Step 5: clone nnode1 5 times

In Oracle VirtualBox Manager, make sure nnode1 is powered off. Right click on nnode1, and then select “Clone…”. Rename the new machine to nnode2 (dnode1, dnode2, etc), and check on Reinitialize the MAC address of all network cards.

Choose “Full clone”.

Do the same to clone out other nodes. One caveat is always clone from the first node (nnode1). Oracle VM can mess up some network settings every time a virtual machine is cloned. if you clone a clone, the settings might get wacky and hard to fix.

Step 6: configure network on cloned hosts

These steps are to be carried out on all hosts.

First, the cloned virtual machines carry the original host name. You need to login to each host as root, and modify /etc/hostname file to correct that.

Next, clean up /etc/hosts. Depends on what version of Oracle Linux you happen to downloaded, /etc/hosts file might get screwed up during the clone process. I noticed so with OEL6.8. This is a critical configuration. You will want to make sure this file is exactly the same on all newly cloned hosts.

Next, fix IP v4 address on all cloned hosts by editing /etc/sysconfig/network-scripts/ifcfg-eth0. For example, on nnode2, you should change configuration like following:

[hadoop@nnode2 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR=192.168.0.222

Last make sure the eth0 interface will connect automatically. the setting is under Applications -> System Tools -> Network -> Wired -> Settings (icon) -> Identity.

Once corrected, issue “init 6” as root to restart the virtual machine.

Step 7: setup SSH auto login

For SSH auto login to work, the hadoop user has to have the same user id, which is a number, on all hosts. If you cloned every host properly, then the user id should be identical on all hosts. If any chance, it went off the plan, then the user id can be different. so it is critical to check it using command id.

You will need to run following commands one by one since SSH auto login is not enabled yet. But you can do it all from the nnode1.

[hadoop@nnode1 ~]$ id hadoop
uid=5001(hadoop) gid=5001(hadoop) groups=5001(hadoop),1000(dev)
[hadoop@nnode1 ~]$ ssh nnode2 ‘id hadoop’
uid=5001(hadoop) gid=5001(hadoop) groups=5001(hadoop),1000(dev)
[hadoop@nnode1 ~]$ ssh dnode1 ‘id hadoop’
uid=5001(hadoop) gid=5001(hadoop) groups=5001(hadoop),1000(dev)
[hadoop@nnode1 ~]$ ssh dnode2 ‘id hadoop’
uid=5001(hadoop) gid=5001(hadoop) groups=5001(hadoop),1000(dev)
[hadoop@nnode1 ~]$ ssh dnode3 ‘id hadoop’
uid=5001(hadoop) gid=5001(hadoop) groups=5001(hadoop),1000(dev)
[hadoop@nnode1 ~]$ ssh dnode4 ‘id hadoop’
uid=5001(hadoop) gid=5001(hadoop) groups=5001(hadoop),1000(dev)

If on any host the user id is different. use following command to correct.

usermod hadoop -g 5001

Next, run ssh-keygen -t rsa on each host as hadoop users. You will need to run following commands one by one since SSH auto login is not enabled yet. But you can do it all from the nnode1.

[hadoop@nnode1 .ssh]$ ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
[hadoop@nnode1 .ssh]$ ssh nnode2 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[hadoop@nnode1 .ssh]$ ssh dnode1 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[hadoop@nnode1 .ssh]$ ssh dnode2 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[hadoop@nnode1 .ssh]$ ssh dnode3 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[hadoop@nnode1 .ssh]$ ssh dnode4 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”

Do the same for root users. Again, you will need to run following commands one by one since SSH auto login is not enabled yet. But you can do it all from the nnode1.

[hadoop@nnode1 .ssh]$ su –
[root@nnode1 ~]# cd .ssh
[root@nnode1 .ssh]# ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
[root@nnode1 .ssh]# ssh nnode2 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[root@nnode1 .ssh]# ssh dnode1 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[root@nnode1 .ssh]# ssh dnode2 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[root@nnode1 .ssh]# ssh dnode3 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”
[root@nnode1 .ssh]# ssh dnode4 “ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa”

That will generate 2 RSA taken files under /home/hadoop/.ssh and /root/.ssh folder . File id_rsa contains the private key, and file id_rsa.pub contains the public key.

Next, we will merge all public keys and save into a file called authorized_keys. Then the merged file can be synchronized to all other clones. You will need to run following commands one by one since SSH auto login is not enabled yet. But you can do it all from the nnode1.

[hadoop@nnode1 .ssh]$ cat id_rsa.pub > authorized_keys
[hadoop@nnode1 .ssh]$ ssh nnode2 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[hadoop@nnode1 .ssh]$ ssh dnode1 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[hadoop@nnode1 .ssh]$ ssh dnode2 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[hadoop@nnode1 .ssh]$ ssh dnode3 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[hadoop@nnode1 .ssh]$ ssh dnode4 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[hadoop@nnode1 .ssh]$ chmod 600 authorized_keys
[hadoop@nnode1 .ssh]$ rsync -avzhe ssh /home/hadoop/.ssh/authorized_keys nnode2:/home/hadoop/.ssh
[hadoop@nnode1 .ssh]$ rsync -avzhe ssh /home/hadoop/.ssh/authorized_keys dnode1:/home/hadoop/.ssh
[hadoop@nnode1 .ssh]$ rsync -avzhe ssh /home/hadoop/.ssh/authorized_keys dnode2:/home/hadoop/.ssh
[hadoop@nnode1 .ssh]$ rsync -avzhe ssh /home/hadoop/.ssh/authorized_keys dnode3:/home/hadoop/.ssh
[hadoop@nnode1 .ssh]$ rsync -avzhe ssh /home/hadoop/.ssh/authorized_keys dnode4:/home/hadoop/.ssh

Now, do the same for root user.

[hadoop@nnode1 .ssh]$ su –
[root@nnode1 ~]# cd .ssh

[root@nnode1 .ssh]# cat id_rsa.pub > authorized_keys
[root@nnode1 .ssh]# ssh nnode2 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[root@nnode1 .ssh]# ssh dnode1 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[root@nnode1 .ssh]# ssh dnode2 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[root@nnode1 .ssh]# ssh dnode3 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[root@nnode1 .ssh]# ssh dnode4 ‘cat ~/.ssh/id_rsa.pub’ >> authorized_keys
[root@nnode1 .ssh]# chmod 600 authorized_keys
[root@nnode1 .ssh]# rsync -avzhe ssh /root/.ssh/authorized_keys nnode2:/root/.ssh
[root@nnode1 .ssh]# rsync -avzhe ssh /root/.ssh/authorized_keys dnode1:/root/.ssh
[root@nnode1 .ssh]# rsync -avzhe ssh /root/.ssh/authorized_keys dnode2:/root/.ssh
[root@nnode1 .ssh]# rsync -avzhe ssh /root/.ssh/authorized_keys dnode3:/root/.ssh
[root@nnode1 .ssh]# rsync -avzhe ssh /root/.ssh/authorized_keys dnode4:/root/.ssh

Now, you should be able SSH to all cloned nodes from nnode1 without being asked for password. Also you should be able SSH to all cloned nodes as root without being asked for password.

Note the authorized_keys file need to be in mode 600.

Step 6: configure hadoop on nnode1

Login as root on nnode1 and create a new file hadoop.sh under /etc/profile.d/ folder.

HADOOP_HOME=/opt/app/hadoop-2.7.3
export HADOOP_HOME

HADOOP_PREFIX=/opt/app/hadoop-2.7.3
export HADOOP_PREFIX

HADOOP_COMMON_HOME=/opt/app/hadoop-2.7.3
export HADOOP_COMMON_HOME

HADOOP_MAPRED_HOME=/opt/app/hadoop-2.7.3
export HADOOP_MAPRED_HOME

HADOOP_HDFS_HOME=/opt/app/hadoop-2.7.3
export HADOOP_HDFS_HOME

YARN_HOME=/opt/app/hadoop-2.7.3
export YARN_HOME

Login as hadoop on nnode1 and modify /opt/app/hadoop-2.7.3/etc/hadoop/hadoop-env.sh as following:

# The java implementation to use.
export JAVA_HOME=/usr

On nnode1, modify core-site.xml under $HADOOP_HOME/etc/hadoop.

fs.defaultFS
hdfs://nnode1:9000
io.file.buffer.size
131072
hadoop.tmp.dir
/opt/data/hdfs

On nnode1, modify hdfs-site.xml under $HADOOP_HOME/etc/hadoop.

dfs.datanode.data.dir
file:///opt/data/hdfs/dfs/dnode-dat
dfs.namenode.name.dir
file:///opt/data/hdfs/dfs/nnode-dat
dfs.replication
3
dfs.namenode.secondary.http-address
nnode2:50090

On nnode1, modify yarn-site.xml under $HADOOP_HOME/etc/hadoop.

yarn.resourcemanager.hostname
nnode2
The hostname of the ResourceManager
yarn.nodemanager.aux-services
mapreduce_shuffle
shuffle service for MapReduce

On nnode1, modify mapred-site.xml under $HADOOP_HOME/etc/hadoop.

mapreduce.framework.name
yarn
mapreduce.admin.user.env
HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
mapreduce.jobhistory.address
nnode2:10020
mapreduce.jobhistory.webapp.address
nnode2:19888

On nnode1, add following to start-yarn.sh file under $HADOOP_HOME/sbin, so that JobHistory server daemon starts with Yarn daemon.
# start jobhistory server
"$bin"/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver

Do the same accordingly to stop-yarn.sh.
# start jobhistory server
"$bin"/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver

On nnode1, add following to slaves file under $HADOOP_HOME/etc/hadoop.

dnode1
dnode2
dnode3
dnode4

On nnode1, add following to /home/hadoop/mysites file.

nnode1
nnode2
dnode1
dnode2
dnode3
dnode4

On nnode1, append/modify following to /home/hadoop/.bash_profile.

export JAVA_HOME=/usr
export PATH=$PATH:$HOME/bin:/usr/local/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Step 7: synchronize hadoop configuration to all other nodes

Run following scripts to create /opt/data/hdfs folder and owned by hadoop user.

[hadoop@nnode1 hadoop]$ sudo mkdir /opt/data/hdfs
[hadoop@nnode1 hadoop]$ sudo chown hadoop:dev /opt/data/hdfs
[hadoop@nnode1 hadoop]$ ssh root@nnode2 'mkdir /opt/data/hdfs && chown hadoop:dev /opt/data/hdfs'
[hadoop@nnode1 hadoop]$ ssh root@dnode1 'mkdir /opt/data/hdfs && chown hadoop:dev /opt/data/hdfs'
[hadoop@nnode1 hadoop]$ ssh root@dnode2 'mkdir /opt/data/hdfs && chown hadoop:dev /opt/data/hdfs'
[hadoop@nnode1 hadoop]$ ssh root@dnode3 'mkdir /opt/data/hdfs && chown hadoop:dev /opt/data/hdfs'
[hadoop@nnode1 hadoop]$ ssh root@dnode4 'mkdir /opt/data/hdfs && chown hadoop:dev /opt/data/hdfs'

Login on nnode1 as root and run following scripts to copy /etc/profile.d/hadoop.sh file to other nodes.

for i in `cat /home/hadoop/mysites | grep -v nnode1`; do
rsync -avzhe ssh /etc/profile.d/hadoop.sh $i:/etc/profile.d
done

Login on nnode1 as hadoop user and run following scripts to copy hadoop configuration files to other nodes.

for i in `cat /home/hadoop/mysites | grep -v nnode1`; do
rsync -avzhe ssh /opt/app/hadoop-2.7.3/etc/hadoop/core-site.xml $i:/opt/app/hadoop-2.7.3/etc/hadoop
rsync -avzhe ssh /opt/app/hadoop-2.7.3/etc/hadoop/hdfs-site.xml $i:/opt/app/hadoop-2.7.3/etc/hadoop
rsync -avzhe ssh /opt/app/hadoop-2.7.3/etc/hadoop/yarn-site.xml $i:/opt/app/hadoop-2.7.3/etc/hadoop
rsync -avzhe ssh /opt/app/hadoop-2.7.3/etc/hadoop/mapred-site.xml $i:/opt/app/hadoop-2.7.3/etc/hadoop
rsync -avzhe ssh /opt/app/hadoop-2.7.3/etc/hadoop/slaves $i:/opt/app/hadoop-2.7.3/etc/hadoop
rsync -avzhe ssh /home/hadoop/.bash_profile $i:/home/hadoop
rsync -avzhe ssh /opt/app/hadoop-2.7.3/sbin/start-yarn.sh $i:/opt/app/hadoop-2.7.3/sbin
rsync -avzhe ssh /opt/app/hadoop-2.7.3/sbin/stop-yarn.sh $i:/opt/app/hadoop-2.7.3/sbin
done

Step 8: start and verify Hadoop

On nnode1, while login as hadoop, run following to format hdfs.

[hadoop@nnode1 ~]$ . /etc/profile.d/hadoop.sh
[hadoop@nnode1 ~]$ . ~/.bash_profile
[hadoop@nnode1 ~]$ hdfs namenode -format

On nnode1, while login as hadoop, run following to start HDFS and check status. You should see 4 running data nodes if all worked out. Otherwise, go to one of the data nodes and check logs.

[hadoop@nnode1 ~]$ start-dfs.sh
[hadoop@nnode1 ~]$ hdfs dfsadmin -report

Once HDFS is up, we create a hdfs:///tmp folder and set proper permission on it. This folder is like the /temp folder on Linux.

[hadoop@nnode1 ~]$ hadoop fs -mkdir /tmp
[hadoop@nnode1 ~]$ hadoop fs -chmod -R 1777 /tmp

Step 9: start and verify Yarn

On nnode2, while login as hadoop, run following command to start Yarn and check status. If all successful, it should list 4 running container nodes.

[hadoop@nnode2 ~]$ start-yarn.sh
[hadoop@nnode2 ~]$ yarn node -list

Additional configurations

The GNOME Linux GUI can consume significant amount of CPU and Memory. If you have 2 or 3 guest hosts run on a laptop, you will end up with performance problems. In a operational environment, you only need to have GUI on the edge server, all other nodes can be operated through SSH. In this setup, we can just keep GUI on nnode1. nnode2 and all other date nodes (dnodeX) just need SSH.

To disable GUI on OEL7, run following in a terminal as root:
systemctl set-default multi-user.target

To enable GUI again, run following in a terminal as root:
systemctl set-default graphical.target

In both cases, you will need to restart the system (init 6) to make it effective.

When things go wrong

You might run into problems starting HDFS or YARN. If you check the log files under $HADOOP_HOME/logs, and the cause is on firewall settings. You can try following troubleshooting steps. This can be particularly useful if you are using Oracle Linux 7.

On nnode1, while login as root, save following to /root/setfirewall.sh.

firewall-cmd –set-default-zone=internal
firewall-cmd –zone=internal –change-interface=eth0
firewall-cmd –zone=internal –change-interface=virbr0
firewall-cmd –permanent –zone=internal –add-source=192.168.0.221/24
firewall-cmd –permanent –zone=internal –add-source=192.168.0.222/24
firewall-cmd –permanent –zone=internal –add-source=192.168.0.222/24
firewall-cmd –permanent –zone=internal –add-source=192.168.0.223/24
firewall-cmd –permanent –zone=internal –add-source=192.168.0.224/24
firewall-cmd –permanent –zone=internal –add-source=192.168.0.225/24
firewall-cmd –permanent –zone=internal –add-source=192.168.0.226/24
firewall-cmd –permanent –zone=internal –add-port=8000-9999/tcp
firewall-cmd –permanent –zone=internal –add-port=50000-50099/tcp
firewall-cmd –reload

Change permission of the file: chmod +x /root/setfirewall.sh
Execute the scripts: /root/setfirewall.sh

Copy the script to other nodes and execute.

for i in `cat /home/hadoop/mysites | grep -v nnode1`; do
rsync -avzhe ssh /root/setfirewall.sh $i:/root
ssh $i ‘/root/setfirewall.sh’
done

That will open up all needed ports for hadoop.

Reference material:

Oracle VM VirtualBox download
Steps to setup a single node Hadoop Cluster
Simple Hadoop MapReduce programming