I am working on creating a hbase table where each row represents the history of a user. I haven't been able to find any documentation on best practices or performance hits for doing this. For some users, the cell version can grow into low (single digit) millions at most.
The closest I have been able to find is OpenTSDB's implementation of one cell per hour, which is a possibility for me as well but that will cause the number of rows to grow. I am not sure why one would be preferred over the other.
Essentially, I am wondering what kind of performance hit does HBase incur when a single row grows into the millions of versions/snapshots. Specifically, latency on reads. All the while the table's row space also grows at a normal rate (a few million rows). It would also be great if there exists benchmarks on this already against similar columnar DB's like Cassandra.
I tried using the YCSB tool from Yahoo! but the tool seems to not work well with the most recent HBase release from my experimentation.