6

Does anyone know anything about the HBase REST API? Im currently writing a program which inserts and reads from HBase using curl commands. When trying to read I use the curl get command, e.g.

curl -X GET 'http://server:9090/test/Row-1/Action:ActionType/' -h 'Accept:application/json'

This returns the column Action:ActionType from Row-1. If I want to do the equivalent of a WHERE clause using the GET command I am stuck however. Im not sure its even possible? If I want to find all records where Action:ActionType =1 for example. Help is appreciated!

AlanDev1989
  • 85
  • 1
  • 5

1 Answers1

9

You can do this by using a filter (here a SingleColumnValueFilter) in your CURL request.

First, create a XML file (myscanner.xml) describing your scan. Here we want to filter according to a qualifier value, with EQUAL operator) :

<Scanner batch="10">
    <filter>
        {
            "type": "SingleColumnValueFilter",
            "op": "EQUAL",
            "family": "<FAMILY_BASE64>",
            "qualifier": "<QUALIFIER_BASE64>",
            "latestVersion": true,
            "comparator": {
                "type": "BinaryComparator",
                "value": "<SEARCHED_VALUE_BASE64>"
            }
        }
    </filter>
</Scanner>

You should replace <FAMILY_BASE64>, <QUALIFIER_BASE64> and <SEARCHED_VALUE_BASE64> with your own values (values must be converted to base64, you can do echo -en ${FAMILY} | base64.

Then, submit a CURL request to HBase REST API with this XML file as data :

curl -vi -X PUT \
    -H "Content-Type:text/xml" \
    -d @myscanner.xml \
    "http://${HOST}:${REST_API_PORT}/${TABLE_NAME}/scanner/"

This request should return a Scanner object, like :

[...]
Location: http://${HOST}:${REST_API_PORT}/${TABLE_NAME}/scanner/149123344543470bea57a

Then use the given scanner to iterate through results (request multiple times to iterate) :

curl -vi -X GET \
    -H "Accept: text/xml" \
    "http://${HOST}:${REST_API_PORT}/${TABLE_NAME}/scanner/149123344543470bea57a"

You can also accept "application/json" instead of XML. Notice that the results are base64 encoded.

Sources :

HBase REST Filter ( SingleColumnValueFilter )

A list of filters you can use : https://gist.github.com/stelcheck/3979381

Cloudera documentation about HBase REST API : https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_hbase_rest_api.html

Community
  • 1
  • 1
norbjd
  • 10,166
  • 4
  • 45
  • 80
  • Does this return multiple rows? I need to return multiple rows with a limit. For example 15 rows or rows within a certain time range. – Abhijay Ghildyal Apr 18 '17 at 11:48
  • Yes, as I mentioned you can iterate through the scanner results to get multiple rows. – norbjd Apr 18 '17 at 13:25
  • When I iterate through, it only gives me the other columns of that very row – Abhijay Ghildyal Apr 18 '17 at 14:13
  • If the `batch` parameter in the `scanner` tag is too small (compared to your number of qualifiers), when iterating it might look like the scanner returns always results for the same row. But in fact, if you iterate enough, you must see other rows. Try to add this filter `{"type": "FirstKeyOnlyFilter"}` (keeps only the rowkey) in your scanner to validate that multiple rows are indeed returned. – norbjd Apr 19 '17 at 10:46
  • It said content-length : 0 – Abhijay Ghildyal Apr 20 '17 at 20:03
  • It might be a good idea to open a new thread with more explanations if you want further help. – norbjd Apr 21 '17 at 10:29