58

Does HBase have any command that works like SQL LIMIT query?

I can do it by setStart and setEnd, but I do not want to iterate all rows.

030
  • 10,842
  • 12
  • 78
  • 123
Mohammad
  • 1,474
  • 2
  • 11
  • 20
  • do you want to limit the results based on some condition or just a simple limit that shown top 'n' records? – Tariq Dec 22 '12 at 19:04

5 Answers5

107

From the HBase shell you can use LIMIT:

hbase> scan 'test-table', {'LIMIT' => 5}

From the Java API you can use Scan.setMaxResultSize(N) or scan.setMaxResultsPerColumnFamily(N).

slm
  • 15,396
  • 12
  • 109
  • 124
th30z
  • 1,902
  • 1
  • 19
  • 28
  • 3
    In order for this to work, there needs to be a comma delimiter between the and {'LIMIT'...} so, scan 'test-table'**,** {'LIMIT' => 5} – Engineiro Nov 12 '13 at 22:27
  • 3
    setMaxResultSize is not available for all versions of Scan, for older versions you need to use PageFilter per @mirsik example – javamonkey79 Jun 27 '15 at 00:45
  • According to the docs, you specify `setMaxResultSize()` in bytes. If you want a certain number of rows, you need to use `setLimit()`. – JZC Sep 24 '18 at 22:19
14

There is a filter called PageFilter. Its meant for this purpose.

Scan scan = new Scan(Bytes.toBytes("smith-"));
scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("givenName"));
scan.addColumn(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"));
scan.setFilter(new PageFilter(25));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    // ...
}

http://java.dzone.com/articles/handling-big-data-hbase-part-4

mirsik
  • 981
  • 8
  • 10
7

If one uses HBase Shell, the following command could be used to limit the query results:The "LIMIT" must be enclosed in single quotes.

scan 'table-name', {'LIMIT' => 10}
PrasoonMishra
  • 83
  • 1
  • 12
Animesh Raj Jha
  • 2,704
  • 1
  • 21
  • 25
1

A guaranteed way is to do the limiting on the client side, inside the iterator loop. This is the approach taken in the HBase Ruby Shell. From table.rb ($HBASE_HOME/hbase-shell/src/main/ruby/hbase/table.rb): Line 467:

  # Start the scanner
  scanner = @table.getScanner(_hash_to_scan(args))
  iter = scanner.iterator

  # Iterate results
  while iter.hasNext
    if limit > 0 && count >= limit
      break
    end

    row = iter.next
    ...
 end

It can be made a bit more efficient by adding scan.setFilter(new PageFilter(limit)) and scan.setCaching(limit), and then table.getScanner(scan). The page filter will ensure that each region server will return at most limit rows, the scan caching limit will ensure that each region server will read ahead and cache at most 'limit' rows, and then the client loop limit checking can break the loop after getting the first 'limit' rows in the order received by the client.

0

In HBase 1.2, the Scan.setMaxResultSize(N) may not act as a parameter of row number limitation. The maxResultSize limit the maximum result size in bytes (cached in the client side). I found the ResultScanner.next(int nbRows) can limit the row numbers during the iteration.