I'm new at Hbase. I'm facing a problem when bulk loading data from a text file into Hbase. Assuming I have a following table:
Key_id | f1:c1 | f2:c2
row1 'a' 'b'
row1 'x' 'y'
- When I parse 2 records and put it into Hbase at the same time (same timestamps), then only version
{row1 'x' 'y'}
updated. Here is the explanation:
When you put data into HBase, a timestamp is required. The timestamp can be generated automatically by the RegionServer or can be supplied by you. The timestamp must be unique per version of a given cell, because the timestamp identifies the version. To modify a previous version of a cell, for instance, you would issue a Put with a different value for the data itself, but the same timestamp.
I'm thinking about the idea that specify the timestamps but I don't know how to set automatically timestamps for bulkloading and Does it affect the loading performance?? I need fastest and safely importing process for big data.
- I tried to parse and put Each record into table, but the speed is very very slow...So another question is: How many records/size of data should in batch before put into hbase. (I write a simple java program to put. It's slower much more than I use Imporrtsv tool by commands to import. I dont know exactly how many size in batch of this tool..)
Many thx for your advise!