This is the next step after you have completed the initial setup to get Hadoop running, this will walk you through the steps to get Hive running, everything in the prior post is a prerequisite to this setup. Just as my prior post, this is nothing “ground breaking”, but hopefully will provide a consolidated place to look. This install is (really) about 15 minutes, including some time to make a new clone of the system, the hard part is already done.
So you say… what is Hive?? …. “Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.”
As with the Hadoop install, there is a great deal of documentation about Hive that I will distill into something in the near future, this post is about getting it running.
Getting started
In /bin (or /usr/local/bin … or wherever you want it)
wget http://apache.tradebit.com/pub/hive/stable/hive-0.8.1.tar.gz
gunzip and tar -xvf that file
mv hive-0.8.1 hive
Environmental
Set the HIVE_HOME variable to the installation directory
Add $HIVE_HOME/bin to your PATH
HDFS setup
There are a couple of HDFS file systems that you will need (tmp is likely already there from running the Hdoop tests)
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
GO!
$HIVE_HOME/bin/hive
If all goes well… you will get a message about what file the history is going to, and you will now have a “hive>” prompt.
Verify (DDL)
hive> create table pokes (foo int, bar string);
hive> show tables;
Load some data…
If you were in /bin/hive when you started hive… the path is relative to the the starting directory
hive> load data local inpath ‘./examples/files/kv1.txt’ overwrite into table pokes;
hive> select * from pokes;
Map Reduce… you are there!
hive> select count(*), foo
> from pokes
> where foo > 0
> group by foo;
You will see the map reduce jobs execute… and then the results…
Trackbacks/Pingbacks
[…] From Hadoop to Hive […]
[…] From Hadoop to Hive […]