7

Anybody is using SparkSQL on HBase tables directly, like SparkSQL on Hive tables. I am new to spark.Please guide me how to connect hbase and spark.How to query on hbase tables.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
user6608138
  • 381
  • 1
  • 4
  • 20

1 Answers1

5

AFAIK there are 2 ways to connect to hbase tables

- Directly connect to Hbase :

Directly connect hbase and create a DataFrame from RDD and execute SQL on top of that. Im not going to re-invent the wheel please see How to read from hbase using spark as the answer from @iMKanchwala in the above link has already described it. only thing is convert that in to dataframe (using toDF) and follow the sql approach.

- Register table as hive external table with hbase storage handler and you can use hive on spark from hivecontext. It is also easy way.

Ex : 
CREATE TABLE users(
userid int, name string, email string, notes string)
STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ( 
"hbase.columns.mapping" = 
”small:name,small:email,large:notes”);

How to do that please see as an example

I would prefer approach 1.

Hope that helps...

Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Thanks for your answers.It is more help for me.I tried second approach means HBaseStorageHandler tables,i am not able to connect using HiveContext.Can you please tell me how to create context object for this type of tables.It throws ClassNotFoundException.Is there any configurations required. – user6608138 Sep 22 '16 at 13:23
  • I hope you have not used ** . if you used then remove .all * characters . I edited my answer as well. reg. hivecontext val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) is the way. – Ram Ghadiyaram Sep 22 '16 at 13:27
  • CREATE TABLE test.sample(id string,name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,details:name") TBLPROPERTIES ("hbase.table.name" = "sample"); Starting Spark shell: spark-shell --master local[2] – user6608138 Sep 23 '16 at 07:37
  • In spark-shell: val sqlContext=new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(*) from test.sample”).collect() java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes – user6608138 Sep 23 '16 at 07:37
  • I am doing like this.I added this setting in hadoop-env.sh as export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/* but I am getting that NoClassDefFoundException.What is my mistake.please suggest me. – user6608138 Sep 23 '16 at 07:39
  • export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:` hbase classpath ` I added space before single quote (before and after hbase classpath ) you remove that and try – Ram Ghadiyaram Sep 23 '16 at 08:15
  • Does the 2nd approach allows upserts? – Vibha Jun 12 '17 at 08:43