Install and configure ZooKeeper

ZooKeeper version: 3.4.9 (http://zookeeper.apache.org)

Last time when we installed HBase, we configured the packaged ZooKeeper. This practice is to install a separate ZooKeeper so that it can work with other distributed applications, like Kafka.

ZooKeeper is a cluster management application. Many other Apache distributed applications use ZooKeeper as cluster management platform. ZooKeeper itself is a distributed application, but ZooKeeper is only distributed over a quorum. A quorum allows reliable service when there are server failures. A basic rule of quorum is to have 2n+1 nodes to allow maximum n nodes failure without disrupting cluster operation. so to allow 1 node failure, we need minimum 3 nodes for ZooKeeper. To allow 2 nodes fail at the same time, we need 5 nodes. One thing to keep in mind is that the ZooKeeper quorum is not the same as cluster size, the cluster can be much larger. For example, you may have ZooKeeper installed on only 5 nodes, but the cluster can have 100 nodes.

ZooKeeper software package can be downloaded from here. Version 3.4.9 is the stable release as of March of 2017. Once downloaded, unzip the package into a folder where we installed all other big data applications.

[hadoop@nnode1 Downloads]$ pwd
/home/hadoop/Downloads
[hadoop@nnode1 Downloads]$ ls zookeeper-3.4.9.tar.gz
zookeeper-3.4.9.tar.gz
[hadoop@nnode1 Downloads]$ tar xvfz zookeeper-3.4.9.tar.gz
[hadoop@nnode1 Downloads]$ sudo mv zookeeper-3.4.9 /opt/app
[hadoop@nnode1 Downloads]$ cd /opt/app/zookeeper-3.4.9/conf
[hadoop@nnode1 conf]$ cp zoo_sample.cfg zoo.cfg

The default configuration file is conf/zoo.cfg. We will put following configuration in it.

[hadoop@nnode1 conf]$ cat zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=5
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=2
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/data/zkdata
# the port at which the clients will connect
clientPort=2181
server.1=nnode1:2888:3888
server.2=nnode2:2888:3888
server.3=dnode1:2888:3888

Here we configured /opt/data/zkdata as the storage location for ZooKeeper database and logs files. This storage location is required on each node that hosts ZooKeeper. So we will create the folder on nnode1, nnode2, and dnode1.

[hadoop@nnode1 ~]$ su –
[root@nnode1 ~]# mkdir /opt/data/zkdata
[root@nnode1 ~]# chown hadoop:dev /opt/data/zkdata
[root@nnode1 ~]# ssh nnode2 'mkdir /opt/data/zkdata && chown hadoop:dev /opt/data/zkdata'
[root@nnode1 ~]# ssh dnode1 'mkdir /opt/data/zkdata && chown hadoop:dev /opt/data/zkdata'
[root@nnode1 ~]# ssh dnode2 'mkdir /opt/data/zkdata && chown hadoop:dev /opt/data/zkdata'

Next, on each ZooKeeper node, create a file /opt/data/zkdata/myid. This will hold the unique node id of each ZooKeeper node. As we specified in the zoo.cfg file, the first node nnode1 shall have id 1, and nnode2 shall have id 2, and dnode1 has id 3.

Next, add following to .bashrc file, to setup ZooKeeper environment at start.

export ZOOKEEPER_HOME=/opt/app/zookeeper-3.4.9
export PATH=$PATH:$ZOOKEEPER_HOME/bin

Next, create start-zookeeper and stop-zookeeper scripts under $ZOOKEEPER_HOME/bin folder, as following. The start-zookeeper script calls “zkServer.sh start” on each node. And the stop-zookeeper script calls “zkServer.sh stop” on each node.

Make the scripts executable.

[hadoop@nnode1 bin]$ chmod +x start-zookeeper
[hadoop@nnode1 bin]$ chmod +x stop-zookeeper

To propagate ZooKeeper installation to other nodes, we use rsync.

for i in `cat /home/hadoop/mysites | grep -v nnode1`; do
rsync -avzhe ssh /opt/app/zookeeper-3.4.9 $i:/opt/app
rsync -avzhe ssh /home/hadoop/.bashrc $i:/home/hadoop
done

To verify installation, run zkCli.sh as following to connect to each specific node. And then on command line, run “ls /” to check. And each should return an empty root storage.

[hadoop@nnode1 zookeeper-3.4.9]$ zkCli.sh -server nnode1:2181
[hadoop@nnode1 zookeeper-3.4.9]$ zkCli.sh -server nnode2:2181
[hadoop@nnode1 zookeeper-3.4.9]$ zkCli.sh -server dnode1:2181

[zk: nnode2:2181(CONNECTED) 1] ls /
[cluster, controller_epoch, brokers, zookeeper, admin, isr_change_notification, consumers, config]
[zk: nnode2:2181(CONNECTED) 2]