35

I am designing an app to run on hbase and want to interactively explore the contents of my cluster. I am in the hbase shell and I want to perform a scan of all keys starting with the chars "abc". Such keys might inlcude "abc4", "abc92", "abc20014" etc... I tried a scan

hbase(main):003:0> scan 'mytable', {STARTROW => 'abc', ENDROW => 'abc'}

But this does not seem to return anything since there is technically no rowkey "abc" only rowkeys starting with "abc"

What I want is something like

hbase(main):003:0> scan 'mytable', {STARTSROWPREFIX => 'abc', ENDROWPREFIX => 'abc'}

I hear HBase can do this quickly and is one of its main selling points. How do I do this in the hbase shell?

Naman
  • 27,789
  • 26
  • 218
  • 353
David Williams
  • 8,388
  • 23
  • 83
  • 171

4 Answers4

57

So it turns out to be very easy. The scan ranges are not inclusive, the logic is start <= key < end. So the answer is

scan 'mytable', {STARTROW => 'abc', ENDROW => 'abd'}
David Williams
  • 8,388
  • 23
  • 83
  • 171
  • That's right - looks like you found this out the hard way. :) Do you want to mark this as the right answer? – Suman Jul 10 '13 at 20:24
  • however hbase doc should say that startrow is actually startrowprefix – nir May 12 '14 at 21:15
  • 2
    If your rows only use 'ASCII' values then it is as simple as you describe here. If you really use binary rowkeys then it becomes a lot more difficult. Check here https://issues.apache.org/jira/browse/HBASE-11990 to see what discussion and edge cases trying to create a generic solution brought to light. – Niels Basjes Sep 29 '14 at 15:42
  • Does this {STARTROW => 'abc', ENDROW => 'abd'} have a Java API equivalent? I've only managed to find PrefixFilter and this range-like approach would suit me better – Matt Aug 30 '21 at 19:21
44

In recent versions of HBase you can now do in the hbase shell:

scan 'mytable', {ROWPREFIXFILTER => 'abc'}

This effectively does this (and also works for binary situations)

scan 'mytable', {STARTROW => 'abc', ENDROW => 'abd'}

This method is a LOT more efficient than the "PrefixFilter" approach because the latter puts all records through the comparison code the is present in this PrefixFilter class.

Niels Basjes
  • 10,424
  • 9
  • 50
  • 66
  • 2
    I'm having trouble understanding the purpose of the PrefixFilter, when startrow and stoprow appear to be superior. Do you know of any usecases? I've also heard that people combine all three. – Matthew Moisen Oct 22 '16 at 19:13
  • I never use the PrefixFilter at all anymore. Perhaps there is a good reason to use it when doing something in a coprocessor, otherwise I would even vote to remove the class from HBase altogether. – Niels Basjes Oct 22 '16 at 19:56
  • 1
    Unfortunately I've been using it this whole time because I mistakenly assumed that you needed to have an exact match on the start and end rows. I ran a test on 5million rows divided between 26 different rowkey prefixes, and the prefix filter is about 300% slower for me on average. Now I'm spending my Saturday refactoring all of my jobs :) – Matthew Moisen Oct 23 '16 at 01:17
  • Not sure if you would know the answer to this, but I figured I would send it your way: http://stackoverflow.com/questions/40197883/how-does-the-use-of-startrow-and-stoprow-not-result-in-a-full-table-scan-in-hbas – Matthew Moisen Oct 23 '16 at 01:19
26

The accepted solution won't work in all cases (binary keys). In addition, using a PrefixFilter can be slow because it performs a table scan until it reaches the prefix. A more performant solution is to use a STARTROW and a FILTER like so:

 scan 'my_table', {STARTROW => 'abc', FILTER => "PrefixFilter('abc')"}
Ben English
  • 3,900
  • 2
  • 22
  • 32
  • I'm having trouble understanding the purpose of the `PrefixFilter`, when `startrow` and `stoprow` appear to be superior. Do you know of any usecases? I've also heard that people combine all three. – Matthew Moisen Oct 22 '16 at 19:12
  • This is the solution that worked for me. My key is composed of AAA_B_CCC. I needed all the rows where the key started with AAA_. – Amro Younes May 10 '17 at 21:02
1

I think what you need is a filter

checkout the answer for following question Scan with filter using HBase shell

more filters are listed in http://hbase.apache.org/book/client.filter.html

Community
  • 1
  • 1
Mehul Rathod
  • 1,244
  • 8
  • 7
  • I am under the impression filters are much slower that range scans. http://stackoverflow.com/questions/10942638/should-i-user-prefixfilter-or-rowkey-range-scan-in-hbase. Is there a way to do do this with a range scan? – David Williams Jul 09 '13 at 21:41
  • 1
    @DavidWilliams : Yes, range queries are faster. – Tariq Jul 10 '13 at 03:16