Hadoop +1 (add a node that is…)

This one gets a little finicky depending on you configuration, and how much horsepower you have available to you.  If you started of with my first post, and built a VM … ideally … you made a clone of the host once you had Hadoop running, which will make this “easier”.  You are quickly getting into the realm of things that really will require multiple machines and multiple hosts… The second part of this series got us into Hive…. this one is back to Hadoop and getting data spread out a bit.  So here goes…

Networking

On the clone (which will be your slave), in the VM software, go to the networking section and obtain a new MAC address.  Both the guest hosts will be bridging to the the host’s adapter, and you need to have separate addresses in you network config to make it work.

Update the /etc/sysconfig/network-scripts/ifcfg-eth0 file with the new mac address

ifconfig renew
ifconfig

You should now have a new IP for the slave…. if you don’t have eth0, or your IP is not different than your master… you MUST resolve that before you continue.

After you have the new host-name, and IP… go through the .ssh setup (just like you did on the first host) to create a new rsa_id.pub and authorized_keys.

Once you have that, fire-up the clone, and add an entry in the .bash_profile to separate the hosts this will be my slave:
hostname hadoop2

Config

On the slave edit /etc/hosts create the IP entries for BOTH hosts

On the master edit /etc/hosts create the IP entries for BOTH hosts, and COMMENT OUT the 127.0.0.1 entry

On the slave $HADOOP_HOME/conf/
Put the slave IP into the slaves files
put the master IP into the masters file
put the IP for the master into the mapred-site.xml for the mapred.job.tracker value
put the IP for the master into the core-stie.xml for fs.default.name value

On the Master $HADOOP_HOME/conf/

ADD the IP address for the slave into the slaves file (there will be 2 lines here)
Check the masters file, should have 1 entry for JUST the master host

Format the node

First … you need to remove the nodename format you did when you originally built this… (this is after all a clone of the other host)
cd /tmp
rm -fr hadoop-root

$HADOOP_HOME/bin/hadoop nodename -format

Start Cluster

Restart both hosts… start clean with no processes running….

On the Master $HADOOP_HOME/bin/start-all.sh

It will ask you for the slave passwords, it will add the ssh key to the master on the first execution etc… but as a result… you should have all the processes running on the master that you would expect (jps to check)… in addition, the tasktracker, and datanode processes on the slave were automatically started as well (jps on the slave to check)

Verify

You have already done jps on both hosts, and see the expected processes.

for the Master http://_ip_address_:50070/
You should see something …. with “live nodes” 2 like this:

cluster summary

You should also see 2 entries in the nodename when you click on “live nodes”
namenode view

Also… on the master:

$HADOOP_HOME/bin/hadoop fsck /

Will give you the health of your data notes, status of replication etc… you will see the 2 data nodes listed.  Do not be alarmed that your data is “missing replicas” and that you are under-replicated… those will start making more sense once you go beyond 2 nodes….

If you have made it this far… you now have 2 nodes… in 1 rack (and hopefully) have a much better working understanding of how Hadoop (and Hive) both work, and how they work together.

http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html#NameNode+and+DataNodes

As I mentioned in the others posts, I will be digging through some of the available documentation for both Hadoop and Hive to distill a few things.  There is a lot of documentation, with this series of posts, as well as the ones I have planned, hopefully there is something useful in what I have put together to get you up and running with relative ease.

Tags: , , ,

One Comment on “Hadoop +1 (add a node that is…)”

  1. Mohamad October 2, 2015 at 10:11 am #

    458Just introduce ateohnr 25% quota for people with less than average IQ (Sibal included), and all the grouses with JEE will be solved!India has become a republic of whining incompetents, pampered by their populist leaders, who want every possible standard to be lowered to the bottom. Further, JEE preparation is what sticks with the students for the rest of their lives, as inside IIT teaching and research exposure is pathetic compared to even some Tier-III US Universities. Consider JEE just like ateohnr Civil Services exam which distributes ill-managed and meager resources among the millions of talented Indians, while the corrupt and largely uneducated politicians steal lacs of crores.Advice to any IITian/Engineer, who loves his field and want to actually do something useful in Engineering and not waste away his life in mind-numbing IT consultancy and getting drunk to subdue his frustration get out of India and pursue your interests (sad but true).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: