From Hadoop to Hive

This is the next step after you have completed the initial setup to get Hadoop running, this will walk you through the steps to get Hive running, everything in the prior post is a prerequisite to this setup.  Just as my prior post, this is nothing “ground breaking”, but hopefully will provide a consolidated place to look.  This install is (really) about 15 minutes, including some time to make a new clone of the system, the hard part is already done.

So you say… what is Hive?? …. “Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.”

As with the Hadoop install, there is a great deal of documentation about Hive that I will distill into something in the near future, this post is about getting it running.

Getting started

In /bin  (or /usr/local/bin … or wherever you want it)

wget http://apache.tradebit.com/pub/hive/stable/hive-0.8.1.tar.gz

gunzip and tar -xvf that file

mv hive-0.8.1 hive

Environmental

Set the HIVE_HOME variable to the installation directory

Add $HIVE_HOME/bin to your PATH

HDFS setup

There are a couple of HDFS file systems that you will need (tmp is likely already there from running the Hdoop tests)

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

GO!

$HIVE_HOME/bin/hive

If all goes well… you will get a message about what file the history is going to, and you will now have a “hive>” prompt.

Verify (DDL)

hive> create table pokes (foo int, bar string);

hive> show tables;

Load some data…

If you were in /bin/hive when you started hive… the path is relative to the the starting directory

hive> load data local inpath ‘./examples/files/kv1.txt’ overwrite into table pokes;

hive> select * from pokes;

 

Map Reduce… you are there!

hive> select count(*), foo
> from pokes
> where foo > 0
> group by foo;

You will see the map reduce jobs execute… and then the results…

Tags: , ,

Trackbacks/Pingbacks

  1. Hadoop +1 (add a node that is…) « StefBauer's Blog – @StefBauer - January 7, 2013

    […] From Hadoop to Hive […]

  2. Hadoop +1 (add a node that is…) « StefBauer's Blog – @StefBauer - January 7, 2013

    […] From Hadoop to Hive […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: