Follow

How To: Configure High Availability for the Interset 3.1.5 Platform

WE STRONGLY RECOMMEND USING THE ATTACHED FILES THAT CAN BE MODIFIED TO MEET ENVIRONMENT

 

Infrastructure

This article is written assuming a 9 server infrastructure, and will utilize the roles outlined below:

Note: zk = zookeeper

ha-ft-reporting (investigator)
ha-ft-analytics (zk server1, hdfs journalnode1, spark master1, ingest, analytics, investigator)

ha-ft-master01 (zk server2, hdfs zk failover controller1, hdfs journalnode2, namenode1, hbase master1)
ha-ft-master02 (zk server3, hdfs zk failover controller2, hdfs journalnode3, namenode2, hbase master2)

ha-ft-data01 (hdfs data1, regionserver1, spark worker1)
ha-ft-data02 (hdfs data2, regionserver2, spark worker2)
ha-ft-data03 (hdfs data3, regionserver3, spark worker3)
ha-ft-data04 (hdfs data4, regionserver4, spark worker4)
ha-ft-data05 (hdfs data5, regionserver5, spark worker5)

 

Configuration Steps:

- These steps assume that hadoop, hbase, spark, and interset archives have already been extracted to /opt/interset and had symlinks created as per the Deployment Guide.
- Use Analytics/Reporting Setup Notes as reference, but note that Zookeeper installation is also required (which is not the case in the Deployment Guide)
- Follow normal system prep, ensure interset can ssh between all servers

 

Install/configure ZooKeeper

cd /opt/interset
wget http://mirror.csclub.uwaterloo.ca/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
tar xvf zookeeper-3.4.6.tar.gz
sudo ln -s zookeeper-3.4.6 zookeeper
sudo mkdir -p /data/interset/zookeeper
sudo chown interset:interset /data/interset/zookeeper
cp /opt/interset/conf/zookeeper/zkEnv.sh /opt/interset/zookeeper/bin/
cp /opt/interset/conf/zookeeper/zookeeper /opt/interset/zookeeper/bin/
cp /opt/interset/conf/zookeeper/zoo.cfg /opt/interset/zookeeper/conf/zoo.cfg
sudo ln -s /opt/interset/zookeeper/bin/zookeeper /etc/init.d/
sudo chkconfig --add zookeeper

Modify zoo.cfg
- Modify autopurge.snapRetainCount to 10 (keeps 10 snapshots)
- Modify autopurge.purgeInterval to 6 (deletes a snapshot every 6 hours)
- Update server names, keep default ports
- Modify tickTime to 2000
- Modify dataDir to /data/interset/zookeeper
- Modify clientPort to 2181

Create and modify /data/interset/zookeeper/myid
- This should be new file called "myid" that references the node # for the server.
- Add node #, (e.g. ha-ft-analytics is 1, ha-ft-master01 is 2, ha-ft-master02 is 3)

Push "zookeeper" to ZooKeeper bin directory of all servers & "zoo.cfg" to conf directory of all servers (e.g. using scp)

- Start ZooKeeper (sudo service zookeeper start) on each server

Confirm that one ZooKeeper instance is "Mode: leader" and rest are "Mode: follower" by running
- sudo service zookeeper status

 

Configure Hadoop

Copy provided config files
cp /opt/interset/conf/hadoop/core-site.xml /opt/interset/hadoop/etc/hadoop/
cp /opt/interset/conf/hadoop/hdfs-site.xml /opt/interset/hadoop/etc/hadoop/
cp /opt/interset/conf/hadoop/hadoop-env.sh /opt/interset/hadoop/etc/hadoop/

- Create jn directory for Hadoop on all Journal Nodes
- mkdir -p /data/interset/hadoop/hdfs/jn

Modify Hadoop slaves file (/opt/interset/hadoop/etc/hadoop/slaves)
- Add each datanode servername on a separate line (ha-ft-data01 through ha-ft-data05)
- Push file to each Hadoop server's hadoop/etc/hadoop/ directory

Modify Hadoop core-site.xml file (/opt/interset/hadoop/etc/hadoop/core-site.xml)
- Modify fs.defaultFS to use ZK name (e.g. interset)
- Modify ha.zookeeper.quorum to ZK servers (ha-ft-analytics, ha-ft-master01, ha-ft-master02) on port 2181
- Push file to each Hadoop server's hadoop/etc/hadoop/ directory

Modify Hadoop hadoop-env.sh file (/opt/interset/hadoop/etc/hadoop/hadoop-env.sh)
- Modify HADOOP_HEAPSIZE value to desired amount (3096)
- Modify HADOOP_NAMENODE_INIT_HEAPSIZE value to desired amount (3096)
- Push file to each Hadoop server's hadoop/etc/hadoop/ directory

Modify Hadoop hdfs-site.xml file (/opt/interset/hadoop/etc/hadoop/hdfs-site.xml)
- Modify/add dfs.nameservices to interset
- Modify/add dfs.ha.namenodes.interset to nn1,nn2 (NOT ACTUAL HOSTNAMES)
- Modify/add dfs.namenode.rpc-address.interset.nn1 to namenode1 (ha-ft-master01)
- Modify/add dfs.namenode.rpc-address.interset.nn2 to namenode2 (ha-ft-master02)
- Modify/add dfs.namenode.http-address.interset.nn1 to namenode1 (ha-ft-master01)
- Modify/add dfs.namenode.http-address.interset.nn2 to namenode2 (ha-ft-master02)
- Modify/add dfs.namenode.shared.edits.dir to point to each journal node (ha-ft-analytics, ha-ft-master01, ha-ft-master02)
- Modify/add dfs.journal.edits.dir to /data/interset/hadoop/hdfs/jn
- Modify/add dfs.ha.automatic-failover.enabled to true
- Modify/add dfs.replication to 3
- Modify/add dfs.namenode.name.dir to file:/data/interset/hadoop/hdfs/nn
- Modify/add dfs.namenode.checkpoint.dir to /data/interset/hadoop/hdfs/snn
- Modify/add dfs.datanode.data.dir to /data/interset/hadoop/hdfs/dn
- Modify/add dfs.domain.socket.path to /home/interset/sockets/dn_socket
- Push file to each Hadoop server's hadoop/etc/hadoop/ directory

Start Journal Nodes on all journalnode servers
- hadoop-daemon.sh start journalnode
- This must be done manually first time

Format namenode on namenode1 (ha-ft-master01)
- hdfs namenode -format

Start namenode (ha-ft-master01)
- hadoop-daemon.sh start namenode
- This must be done manually first time

Bootstrap and start secondary name node (ha-ft-master02)
- hdfs namenode -bootstrapStandby
- hadoop-daemon.sh start namenode
- This must be done manually first time

Start all datanodes (ha-ft-data01 through ha-ft-data05)
- hadoop-daemon.sh start datanode
- This must be done manually first time

Initliaize required state in ZooKeeper from a namenode
- hdfs zkfc -formatZK

Start ZKFC on hdfs zk failover controllers (ha-ft-master01, ha-ft-master02)
- hadoop-daemon.sh start zkfc
- This must be done manually first time

Confirm services are correctly registered as active/standby
- hdfs haadmin -getServiceState nn1
- hdfs haadmin -getServiceState nn2

Verify configuration in browser
- http://ha-ft-master01:50070

Stop Hadoop environment
- stop-dfs.sh
- Verify that all services stop

Start Hadoop environment
- start-dfs.sh
- Verify that all services start

Test failover
- Stop active namenode
- hadoop-daemon.sh stop namenode
- Confirm environment is still running
- Restart stopped namenode
- hadoop-daemon.sh start namenode

 

Configure HBase

Copy provided config files
cp /opt/interset/conf/hbase/* /opt/interset/hbase/conf/

Modify backup-masters (/opt/interset/hbase/conf/backup-masters)
- Add the hostnames of each backup (ha-ft-master02)
- Push file to each HBase server's hbase/conf/ directory

Modify regionservers (/opt/interset/hbase/conf/regionservers)
- Add the hostnames of each regionserver (ha-ft-data01 through ha-ft-data05)
- Push file to each HBase server's hbase/conf/ directory

Modify hbase-site.xml (/opt/interset/hbase/conf/hbase-site.xml)
- Modify/add hbase.rootdir to hdfs://interset/hbase
- Modify/add hbase.zookeeper.quorum to hostnames of each ZK server in the environment (ha-ft-analytics, ha-ft-master01, ha-ft-master02)
- Push file to each HBase server's hbase/conf/ directory

Modify hbase-env.sh (/opt/interset/hbase/conf/hbase-env.sh)
- Modify/add HBASE_HEAPSIZE to desired amount (24576)
- This may want to be manually set lower on Masters (6144)
- Modify/add HBASE_MANAGES_ZK to false
- Modify/add HADOOP_HOME to /opt/interset/hadoop
- Push file to each HBase server's hbase/conf/ directory

Start HBase
- /opt/interset/hbase/bin/start-hbase.sh

Verify configuration in browser
- http://ha-ft-master01:60010

 

Configure Spark

Rename slaves.template to slaves (/opt/interset/spark/conf/slaves)
- Add the hostnames of each worker (ha-ft-data01 through ha-ft-data05)
- Push file to each Spark server's spark/conf directory

Modify spark-env.sh
- Modify/add SPARK_DRIVER_MEMORY to desired amount (4g)
- Modify/add SPARK_WORKER_MEMORY to desired amount (24g)
- Push file to each Spark server's spark/conf directory

Start Spark
- /opt/interset/spark/sbin/start-all.sh

 

Configure Analytics/Reporting as per Deployment Guide with following notes

Modify analytics/conf/interset.conf
- Ensure zkPhoenix is pointing to both ha-ft-master01 and ha-ft-master02 in interset.conf
- Modify parallelism to 1/2 total # of CPUs on Spark slaves

Modify analytics/bin/env.sh
- Modify JAVA_HOME to correct path
- Modify HBASE_MANAGES_ZK to false

Memory for ingest process can be set in analytics/bin/ingest.sh
- -Xmx8192m parameter

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk