Hbase filter specific rows

Asked Aug 03 '21 at 19:42

Active Aug 03 '21 at 19:42

Viewed 213 times

I have a Java Spark (v2.4.7) job that currently reads the entire table from Hbase. My table has millions of rows and reading the entire table is very expensive (memory). My process doesn't need all the data from the Hbase table, how can I avoid reading rows with specific keys?

Currently, I read from Hbase as following:

JavaRDD<Tuple2<ImmutableBytesWritable, Result>> jrdd = sparkSession.sparkContext().newAPIHadoopRDD(DataContext.getConfig(),
            TableInputFormat.class, ImmutableBytesWritable.class, Result.class)

I saw the answer in this post, but I didn't find how can I filter out specific keys.

Any help? Thanks!

asked Aug 03 '21 at 19:42

Oded

1

Generally in HBase you design your table such that your query only needs to refer to a consecutive set of rows. HBase offers several different types of row filter - start with https://stackoverflow.com/questions/17558547/hbase-easy-how-to-perform-range-prefix-scan-in-hbase-shell. – Ben Watson Aug 04 '21 at 09:13
Why is the job memory expensive? Are you loading the entire data into memory? – shay__ Aug 05 '21 at 06:17
Yes. I would like to read only part of it. – Oded Aug 05 '21 at 18:26

Hbase filter specific rows

0 Answers0