12

I have Installed hadoop and hbase cdh3u2. In hadoop i have a file at the path /home/file.txt. it has the data like

one,1
two,2
three,3

I want to import this file into hbase. in that, the first field should parsed as String, and 2nd field parsed as integer, and then it should pushed into hbase. Help me to do this

aThanks in dvance....

Donald Miner
  • 38,889
  • 8
  • 95
  • 118
Nageswaran
  • 7,481
  • 14
  • 55
  • 74

1 Answers1

21

I like using Apache Pig for ingest into HBase because it is simple, straightforward, and flexible.

Here is a Pig script that would do the job for you, after you have created the table and the column family. To create the table and the column family, you'll do:

$ hbase shell
> create 'mydata', 'mycf'

Move the file to HDFS:

$ hadoop fs -put /home/file.txt /user/surendhar/file.txt

Then, write the pig script to store with HBaseStorage (you may have to look up how to set up and run Pig):

A = LOAD 'file.txt' USING PigStorage(',') as (strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
              'mycf:intdata');

Note that in the above script, the key is going to be strdata. If you want to create your own key from something, use a FOREACH statement to generate the key. HBaseStorage assumes that the first thing in the previous relation (A::strdata in this case) is the key.


Some other options would be:

  • Write a Java MapReduce job to do the same thing as above.
  • Interact directly with the HTable with the client and put in row-by-row. This should only be done with much smaller files.
  • Push the data up with the hbase shell using some sort of script (i.e., sed, perl, python) that transforms the lines of csv into shell put commands. Again, this should only be done if the number of records is small.

    $ cat /home/file.txt | transform.pl
    put 'mydata', 'one', 'mycf:intdata', '1'
    put 'mydata', 'two', 'mycf:intdata', '2'
    put 'mydata', 'three', 'mycf:intdata', '3'
    
    $ cat /home/file.txt | transform.pl | hbase shell
    
Donald Miner
  • 38,889
  • 8
  • 95
  • 118
  • Hey Donald. Would you please check out this post? http://stackoverflow.com/questions/21126483/how-to-have-pig-store-rows-in-hbase-as-text-and-not-bytes – Matthew Moisen Jan 14 '14 at 23:50
  • Donald you are a hero for writing this answer! – Alex Dean Apr 25 '14 at 17:32
  • Do no forget to register the required HBase jars in that PIG script. Like that "REGISTER /usr/lib/hbase/lib/*.jar;" – PinoSan Jun 22 '14 at 22:41
  • @Donald i tried this but in my HBASE i am getting only 1 row where as in my logs i am getting `851 files are moved to hbase`. Please help me – animal Nov 10 '16 at 14:08