HBase (Easy): How to Perform Range Prefix Scan in hbase shell

Question

I am designing an app to run on hbase and want to interactively explore the contents of my cluster. I am in the hbase shell and I want to perform a scan of all keys starting with the chars "abc". Such keys might inlcude "abc4", "abc92", "abc20014" etc... I tried a scan

hbase(main):003:0> scan 'mytable', {STARTROW => 'abc', ENDROW => 'abc'}

But this does not seem to return anything since there is technically no rowkey "abc" only rowkeys starting with "abc"

What I want is something like

hbase(main):003:0> scan 'mytable', {STARTSROWPREFIX => 'abc', ENDROWPREFIX => 'abc'}

I hear HBase can do this quickly and is one of its main selling points. How do I do this in the hbase shell?

score 57 · Accepted Answer · answered Jul 09 '13 at 21:46

57

So it turns out to be very easy. The scan ranges are not inclusive, the logic is start <= key < end. So the answer is

scan 'mytable', {STARTROW => 'abc', ENDROW => 'abd'}

answered Jul 09 '13 at 21:46

David Williams

8,388
23
83
171

That's right - looks like you found this out the hard way. :) Do you want to mark this as the right answer? – Suman Jul 10 '13 at 20:24
however hbase doc should say that startrow is actually startrowprefix – nir May 12 '14 at 21:15
2

If your rows only use 'ASCII' values then it is as simple as you describe here. If you really use binary rowkeys then it becomes a lot more difficult. Check here https://issues.apache.org/jira/browse/HBASE-11990 to see what discussion and edge cases trying to create a generic solution brought to light. – Niels Basjes Sep 29 '14 at 15:42
Does this {STARTROW => 'abc', ENDROW => 'abd'} have a Java API equivalent? I've only managed to find PrefixFilter and this range-like approach would suit me better – Matt Aug 30 '21 at 19:21

score 44 · Answer 2 · answered Jul 28 '16 at 09:19

44

In recent versions of HBase you can now do in the hbase shell:

scan 'mytable', {ROWPREFIXFILTER => 'abc'}

This effectively does this (and also works for binary situations)

scan 'mytable', {STARTROW => 'abc', ENDROW => 'abd'}

This method is a LOT more efficient than the "PrefixFilter" approach because the latter puts all records through the comparison code the is present in this PrefixFilter class.

answered Jul 28 '16 at 09:19

Niels Basjes

10,424
9
50
66

2

I'm having trouble understanding the purpose of the PrefixFilter, when startrow and stoprow appear to be superior. Do you know of any usecases? I've also heard that people combine all three. – Matthew Moisen Oct 22 '16 at 19:13
I never use the PrefixFilter at all anymore. Perhaps there is a good reason to use it when doing something in a coprocessor, otherwise I would even vote to remove the class from HBase altogether. – Niels Basjes Oct 22 '16 at 19:56
1

Unfortunately I've been using it this whole time because I mistakenly assumed that you needed to have an exact match on the start and end rows. I ran a test on 5million rows divided between 26 different rowkey prefixes, and the prefix filter is about 300% slower for me on average. Now I'm spending my Saturday refactoring all of my jobs :) – Matthew Moisen Oct 23 '16 at 01:17
Not sure if you would know the answer to this, but I figured I would send it your way: http://stackoverflow.com/questions/40197883/how-does-the-use-of-startrow-and-stoprow-not-result-in-a-full-table-scan-in-hbas – Matthew Moisen Oct 23 '16 at 01:19

score 26 · Answer 3 · answered Oct 30 '15 at 16:00

26

The accepted solution won't work in all cases (binary keys). In addition, using a PrefixFilter can be slow because it performs a table scan until it reaches the prefix. A more performant solution is to use a STARTROW and a FILTER like so:

 scan 'my_table', {STARTROW => 'abc', FILTER => "PrefixFilter('abc')"}

answered Oct 30 '15 at 16:00

Ben English

3,900
2
22
32

I'm having trouble understanding the purpose of the `PrefixFilter`, when `startrow` and `stoprow` appear to be superior. Do you know of any usecases? I've also heard that people combine all three. – Matthew Moisen Oct 22 '16 at 19:12
This is the solution that worked for me. My key is composed of AAA_B_CCC. I needed all the rows where the key started with AAA_. – Amro Younes May 10 '17 at 21:02

score 1 · Answer 4 · edited May 23 '17 at 12:18

1

I think what you need is a filter

checkout the answer for following question Scan with filter using HBase shell

more filters are listed in http://hbase.apache.org/book/client.filter.html

edited May 23 '17 at 12:18

Community

1
1

answered Jul 09 '13 at 21:37

Mehul Rathod

1,244
8
7

I am under the impression filters are much slower that range scans. http://stackoverflow.com/questions/10942638/should-i-user-prefixfilter-or-rowkey-range-scan-in-hbase. Is there a way to do do this with a range scan? – David Williams Jul 09 '13 at 21:41
1

@DavidWilliams : Yes, range queries are faster. – Tariq Jul 10 '13 at 03:16

HBase (Easy): How to Perform Range Prefix Scan in hbase shell

4 Answers4

Linked