First thing is, your rowkey design should be perfect based on which you can define your access pattern to query.
1) Get is good if you know which rowkeys you can acccess upfront
In that case you can use method like below , it will return array of Result.
/**
* Method getDetailRecords.
*
* @param listOfRowKeys List<String>
* @return Result[]
* @throws IOException
*/
private Result[] getDetailRecords(final List<String> listOfRowKeys) throws IOException {
final HTableInterface table = HBaseConnection.getHTable(TBL_DETAIL);
final List<Get> listOFGets = new ArrayList<Get>();
Result[] results = null;
try {
for (final String rowkey : listOfRowKeys) {// prepare batch of get with row keys
// System.err.println("get 'yourtablename', '" + saltIndexPrefix + rowkey + "'");
final Get get = new Get(Bytes.toBytes(saltedRowKey(rowkey)));
get.addColumn(COLUMN_FAMILY, Bytes.toBytes(yourcolumnname));
listOFGets.add(get);
}
results = table.get(listOFGets);
} finally {
table.close();
}
return results;
}
2)
In my experience with Hbase Scan performance is bit low if we dont
have perfect rowkey design. I recommend if you are opting for scan for the above mentioned scenario by you.
FuzzyRowFilter(see hbase-the-definitive) This is really useful in our case
We have used bulk clients like map-reduce as well as standalone hbase clients
This filter acts on row keys, but in a fuzzy manner. It needs a list of row keys that should be returned, plus an accompanying byte[] array that signifies the importance of each byte in the row key. The constructor is as such:
FuzzyRowFilter(List<Pair<byte[], byte[]>> fuzzyKeysData)
The fuzzyKeysData specifies the mentioned significance of a row key byte, by taking one of two values:
0 Indicates that the byte at the same position in the row key must
match as-is. 1 Means that the corresponding row key byte does not
matter and is always accepted.
Example: Partial Row Key Matching
A possible example is matching partial keys, but not from left to right, rather somewhere inside a compound key. Assuming a row key format of _, with fixed length parts, where is 4, is 2, is 4, and is 2 bytes long. The application now requests all users that performed certain action (encoded as 99) in January of any year. Then the pair for row key and fuzzy data would be the following:
row key
"????99????_01", where the "?" is an arbitrary character, since it is ignored.
fuzzy data
= "\x01\x01\x01\x01\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x00"
In other words, the fuzzy data array instructs the filter to find all row keys matching "????99????_01", where the "?" will accept any character.
An advantage of this filter is that it can likely compute the next matching row key when it comes to an end of a matching one. It implements the getNextCellHint() method to help the servers in fast-forwarding to the next range of rows that might match. This speeds up scanning, especially when the skipped ranges are quite large. Example 4-12 uses the filter to grab specific rows from a test data set.
Example filtering by column prefix
List<Pair<byte[], byte[]>> keys = new ArrayList<Pair<byte[], byte[]>>();
keys.add(new Pair<byte[], byte[]>(
Bytes.toBytes("row-?5"), new byte[] { 0, 0, 0, 0, 1, 0 }));
Filter filter = new FuzzyRowFilter(keys);
Scan scan = new Scan()
.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-5"))
.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(result);
}
scanner.close();
The example code also adds a filtering column to the scan, just to keep the output short:
Adding rows to table...
Results of scan:
keyvalues={row-05/colfam1:col-01/1/Put/vlen=9/seqid=0,
row-05/colfam1:col-02/2/Put/vlen=9/seqid=0,
...
row-05/colfam1:col-09/9/Put/vlen=9/seqid=0,
row-05/colfam1:col-10/10/Put/vlen=9/seqid=0}
keyvalues={row-15/colfam1:col-01/1/Put/vlen=9/seqid=0,
row-15/colfam1:col-02/2/Put/vlen=9/seqid=0,
...
row-15/colfam1:col-09/9/Put/vlen=9/seqid=0,
row-15/colfam1:col-10/10/Put/vlen=9/seqid=0}
The test code wiring adds 20 rows to the table, named row-01 to row-20. We want to retrieve all the rows that match the pattern row-?5, in other words all rows that end in the number 5. The output above confirms the correct result.