This one gets a little finicky depending on you configuration, and how much horsepower you have available to you. If you started of with my first post, and built a VM … ideally … you made a clone of the host once you had Hadoop running, which will make this “easier”. You are quickly getting into the realm of things that really will require multiple machines and multiple hosts… The second part of this series got us into Hive…. this one is back to Hadoop and getting data spread out a bit. So here goes…
Networking
On the clone (which will be your slave), in the VM software, go to the networking section and obtain a new MAC address. Both the guest hosts will be bridging to the the host’s adapter, and you need to have separate addresses in you network config to make it work.
Update the /etc/sysconfig/network-scripts/ifcfg-eth0 file with the new mac address
ifconfig renew
ifconfig
You should now have a new IP for the slave…. if you don’t have eth0, or your IP is not different than your master… you MUST resolve that before you continue.
After you have the new host-name, and IP… go through the .ssh setup (just like you did on the first host) to create a new rsa_id.pub and authorized_keys.
Once you have that, fire-up the clone, and add an entry in the .bash_profile to separate the hosts this will be my slave:
hostname hadoop2
Config
On the slave edit /etc/hosts create the IP entries for BOTH hosts
On the master edit /etc/hosts create the IP entries for BOTH hosts, and COMMENT OUT the 127.0.0.1 entry
On the slave $HADOOP_HOME/conf/
Put the slave IP into the slaves files
put the master IP into the masters file
put the IP for the master into the mapred-site.xml for the mapred.job.tracker value
put the IP for the master into the core-stie.xml for fs.default.name value
On the Master $HADOOP_HOME/conf/
ADD the IP address for the slave into the slaves file (there will be 2 lines here)
Check the masters file, should have 1 entry for JUST the master host
Format the node
First … you need to remove the nodename format you did when you originally built this… (this is after all a clone of the other host)
cd /tmp
rm -fr hadoop-root
$HADOOP_HOME/bin/hadoop nodename -format
Start Cluster
Restart both hosts… start clean with no processes running….
On the Master $HADOOP_HOME/bin/start-all.sh
It will ask you for the slave passwords, it will add the ssh key to the master on the first execution etc… but as a result… you should have all the processes running on the master that you would expect (jps to check)… in addition, the tasktracker, and datanode processes on the slave were automatically started as well (jps on the slave to check)
Verify
You have already done jps on both hosts, and see the expected processes.
for the Master http://_ip_address_:50070/
You should see something …. with “live nodes” 2 like this:
You should also see 2 entries in the nodename when you click on “live nodes”
Also… on the master:
$HADOOP_HOME/bin/hadoop fsck /
Will give you the health of your data notes, status of replication etc… you will see the 2 data nodes listed. Do not be alarmed that your data is “missing replicas” and that you are under-replicated… those will start making more sense once you go beyond 2 nodes….
If you have made it this far… you now have 2 nodes… in 1 rack (and hopefully) have a much better working understanding of how Hadoop (and Hive) both work, and how they work together.
As I mentioned in the others posts, I will be digging through some of the available documentation for both Hadoop and Hive to distill a few things. There is a lot of documentation, with this series of posts, as well as the ones I have planned, hopefully there is something useful in what I have put together to get you up and running with relative ease.
458Just introduce ateohnr 25% quota for people with less than average IQ (Sibal included), and all the grouses with JEE will be solved!India has become a republic of whining incompetents, pampered by their populist leaders, who want every possible standard to be lowered to the bottom. Further, JEE preparation is what sticks with the students for the rest of their lives, as inside IIT teaching and research exposure is pathetic compared to even some Tier-III US Universities. Consider JEE just like ateohnr Civil Services exam which distributes ill-managed and meager resources among the millions of talented Indians, while the corrupt and largely uneducated politicians steal lacs of crores.Advice to any IITian/Engineer, who loves his field and want to actually do something useful in Engineering and not waste away his life in mind-numbing IT consultancy and getting drunk to subdue his frustration get out of India and pursue your interests (sad but true).
With the whole thing that appears to be developing within this specific subject material, many of your perspectives are actually quite radical. Even so, I appologize, but I do not subscribe to your whole idea, all be it radical none the less. It looks to everyone that your remarks are generally not totally rationalized and in reality you are generally your self not even thoroughly confident of your assertion. In any event I did take pleasure in reading it.