0

Below is my scenario:

  1. Initially load data to HBASE using Sqoop (This is done)
  2. Now, I will get batches of data on daily basis (around 600000 records) which is a combination of new data (for inserting the new records to HBASE) and old data (for updating the existing records of HBASE). Now my question is:

How can I perform this operation using Spark/scala to Hbase table.

Your early reply would be highly appreciated.

Thanks Souvik

Souvik
  • 377
  • 4
  • 16

1 Answers1

0

I would advise you to read answers to this question to get an overview.

In my answer there, I mention several options that you can use:

Since you are using Spark 1.6.1, you can use any of them. An example of working with DataFrames in hbase-spark can be found here, while a similar example for Spark-on-HBase can be found here.

Community
  • 1
  • 1
Anton Okolnychyi
  • 936
  • 7
  • 10
  • Hi Anton: If I use Hive-on-Hbase package (yum install hive-hbase) for bulk insert/update operation then which API will give better performance? I can execute this command through Spak itsel. – Souvik Dec 26 '16 at 06:48