Loading csv data into Hbase

Question

I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.

I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.

The columns are: loan_number, borrower_name, current_distribution_date, loan_amount

I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

What I'm missing is:

Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?

I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce

I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.

score 15 · Accepted Answer · answered Dec 18 '12 at 14:48

There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:

1) Use HBase tools like importtsv and completebulkload http://hbase.apache.org/book/arch.bulk.load.html

2) Use Pig to bulk load data. Example:

A = LOAD '/hbasetest.txt' USING PigStorage(',') as 
      (strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
              'mycf:intdata');

3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.

4) Do it programatically using a MapReduce job like in the example you mentioned.

Thanks! I'm definitely going to explore those options. – bjoern Dec 19 '12 at 17:01 — bjoern, Dec 19 '12 at 17:01

score 2 · Answer 2 · answered Dec 17 '12 at 06:00

Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?

You can save the Map Reduce classes in anywhere (Either in Win 7 or Ubuntu VM). You can compile it anywhere too. Just make a Jar file with the classes you created and you that jar to run the map reduce in your VM.

Then in your Ubuntu VM after starting Hadoop you can use the following command to run the map reduce class you created.

<Path To Hadoop Bin>/hadoop jar <Path to Jar>/<Jar Name>.jar <Map Reduce Class Name> <Class Arguments> ...

When you run the above command the the Map Reduce class you wrote will be executed along with the Hbase table will be populated.

Hope this helps

Loading csv data into Hbase

2 Answers2

Linked