1

I did some bench on PUT performance from a Java client, but the result is not clear to me.

Here's the problem: What it is the best way to do puts in HBase? A single put with 1000 columns (4 families), or 1000 puts witha single columns? Maybe 4 puts with 250 columns each one?

In theory, what would be the best strategy?

PS: I can't use batch because I need the Wals for Solr.

Thanks.

SBA
  • 113
  • 10

2 Answers2

1

First of all use as few column families as you can (I have provided details in this answer). Second, you must specify not only your write patterns but also read patterns. HBase works best for "write once and read many" scenarios. Therefore you want to design you table thus it will provide the fastest access to data. And this criterion will determine whether you need "tall" or "wide" table. Check out HBase table design chapter of "HBase in Action".

gorros
  • 1,411
  • 1
  • 18
  • 29
  • Thank you @gorros for your answer. The question is not about schema design in HBase. I have already spent much time on this area. The minimal number of families I managed to keep in the final solution is 4. They do not have the same "width", but they respect many best practices of HBase design. – SBA Jul 21 '17 at 14:15
  • Sure, I understand. Maybe I did not grasp the context of question properly. – gorros Jul 21 '17 at 14:43
1

To get good performance for the write operation, you should use a one Put for single Row. In other cases, perfomance will be significantly degraded, because HBase create a lock for row key and in this case, a lot of time will be wasted on synchronization. In a case of single put per row write performance will be comparable with the bulk load.

Alexander Kuznetsov
  • 3,062
  • 2
  • 25
  • 29
  • 1
    Went from 17s per huge row on a minimal test environement to 1.5s. Thanks a lot. – SBA Jul 21 '17 at 12:24