From Hadoop to Hive

This is the next step after you have completed the initial setup to get Hadoop running, this will walk you through the steps to get Hive running, everything in the prior post is a prerequisite to this setup. Just as my prior post, this is nothing “ground breaking”, but hopefully will provide a consolidated place to look. This install is (really) about 15 minutes, including some time to make a new clone of the system, the hard part is already done.

So you say… what is Hive?? …. “Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.”

As with the Hadoop install, there is a great deal of documentation about Hive that I will distill into something in the near future, this post is about getting it running.

Getting started

In /bin (or /usr/local/bin … or wherever you want it)

wget http://apache.tradebit.com/pub/hive/stable/hive-0.8.1.tar.gz

gunzip and tar -xvf that file

mv hive-0.8.1 hive

Environmental

Set the HIVE_HOME variable to the installation directory

Add $HIVE_HOME/bin to your PATH

HDFS setup

There are a couple of HDFS file systems that you will need (tmp is likely already there from running the Hdoop tests)

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

GO!

$HIVE_HOME/bin/hive

If all goes well… you will get a message about what file the history is going to, and you will now have a “hive>” prompt.

Verify (DDL)

hive> create table pokes (foo int, bar string);

hive> show tables;

Load some data…

If you were in /bin/hive when you started hive… the path is relative to the the starting directory

hive> load data local inpath ‘./examples/files/kv1.txt’ overwrite into table pokes;

hive> select * from pokes;

Map Reduce… you are there!

hive> select count(*), foo
> from pokes
> where foo > 0
> group by foo;

You will see the map reduce jobs execute… and then the results…

Tags: hadoop, Hive, Linux

← Single Cluster Hadoop – from Zero to Hadoop

Hadoop +1 (add a node that is…) →

Trackbacks/Pingbacks

Hadoop +1 (add a node that is…) « StefBauer's Blog – @StefBauer - January 7, 2013
[…] From Hadoop to Hive […]
Hadoop +1 (add a node that is…) « StefBauer's Blog – @StefBauer - January 7, 2013
[…] From Hadoop to Hive […]

Follow Me

From Hadoop to Hive

Getting started

Environmental

HDFS setup

GO!

Verify (DDL)

Load some data…

Map Reduce… you are there!

Trackbacks/Pingbacks

Leave a comment Cancel reply

My Latest

Community

Tools

Useful Links

Tags

Archives

Follow Me

From Hadoop to Hive

Getting started

Environmental

HDFS setup

GO!

Verify (DDL)

Load some data…

Map Reduce… you are there!

Rate this:

Share this:

Related

Trackbacks/Pingbacks

Leave a comment Cancel reply

My Latest

Community

Tools

Useful Links

Tags

Categories

Archives