0

I am working on creating a hbase table where each row represents the history of a user. I haven't been able to find any documentation on best practices or performance hits for doing this. For some users, the cell version can grow into low (single digit) millions at most.

The closest I have been able to find is OpenTSDB's implementation of one cell per hour, which is a possibility for me as well but that will cause the number of rows to grow. I am not sure why one would be preferred over the other.

Essentially, I am wondering what kind of performance hit does HBase incur when a single row grows into the millions of versions/snapshots. Specifically, latency on reads. All the while the table's row space also grows at a normal rate (a few million rows). It would also be great if there exists benchmarks on this already against similar columnar DB's like Cassandra.

I tried using the YCSB tool from Yahoo! but the tool seems to not work well with the most recent HBase release from my experimentation.

Jamil Seaidoun
  • 949
  • 1
  • 11
  • 24
  • Not quite clear about your problem, it seems that you don't want to limit the version num of your column family, and if there are to many versions, it will effect your performance? – sel-fish Jun 16 '16 at 01:35
  • That's right, I was curious if there are any existing benchmarks or papers on performance where the version of a cell grows high – Jamil Seaidoun Jun 16 '16 at 01:57
  • I'm curious about the description of your hbase table, how to set the version num unlimited.. – sel-fish Jun 16 '16 at 03:35
  • It's not necessarily unlimited but more just set to max integer – Jamil Seaidoun Jun 16 '16 at 05:17
  • So what's the performance you care about? When you get a row with a lot of snapshot, you think it cost you too much time ? – sel-fish Jun 16 '16 at 07:09
  • I want to know what is the performance impact of reading from a row with millions of versions proportional to a similar table in a row with significantly less. Is the impact manageable or not, does it throttle the region servers or can the region server serve it in a reasonable amount of time? what is the average latency? I am hoping there are benchmarks on this. – Jamil Seaidoun Jun 16 '16 at 23:56
  • 1
    Maybe you can add these to your question, then someone with related experience may notice it :) – sel-fish Jun 17 '16 at 02:11
  • Thanks for the help, I thought I was asking that in the question but I can see how it is less clear – Jamil Seaidoun Jun 17 '16 at 21:14
  • 1
    http://stackoverflow.com/questions/37844141/does-hbase-impose-a-maximum-size-per-row this is a similar question, i think, maybe you can get some tips if someone answer it later :) – sel-fish Jun 18 '16 at 05:51
  • 1
    do you mean column key count in a row by "versioning" or really mean hbase cell value versioning feature ? It think you need one user's entire history together in one row, and if that is the case, you can do this for millions of column keys. – halil Jun 21 '16 at 08:14
  • I mean the feature of cell value versioning. One users entire history is stored as a version in a single cell. – Jamil Seaidoun Jun 21 '16 at 17:36

0 Answers0