Anybody is using SparkSQL on HBase tables directly, like SparkSQL on Hive tables. I am new to spark.Please guide me how to connect hbase and spark.How to query on hbase tables.
Asked
Active
Viewed 1.1k times
7
-
1without any comments on my question,it is degraded into -3.what is the reason.it's not fare. – user6608138 Sep 16 '16 at 12:58
-
Why you need to query on Hbase table?? – Avijit Sep 16 '16 at 13:28
-
You can create external table of Hbase in Hive. Since Hbase is no sql and distributed column-oriented database built on top of the Hadoop file system, I have lots of doubt that you can able to query on Hbase. – Avijit Sep 16 '16 at 13:30
-
1@Avijit, thanks for reply. I tried your suggested approach but could not successfully complete. Please refer this link http://stackoverflow.com/questions/39285262/sparksqlhivehbasehbaseintegration-doesnt-work – user6608138 Sep 19 '16 at 07:25
-
Hi @user6608138 please try http://stackoverflow.com/questions/25040709/how-to-read-from-hbase-using-spark – Ram Ghadiyaram Sep 19 '16 at 15:44
-
was my answer helpful? feel free to ask questions – Ram Ghadiyaram Sep 21 '16 at 07:13
-
yes it is more help for me.Thanks for this reply. – user6608138 Sep 21 '16 at 08:58
1 Answers
5
AFAIK there are 2 ways to connect to hbase tables
- Directly connect to Hbase :
Directly connect hbase and create a DataFrame
from RDD
and execute SQL on top of that.
Im not going to re-invent the wheel please see How to read from hbase using spark
as the answer from @iMKanchwala in the above link has already described it. only thing is convert that in to dataframe (using toDF
) and follow the sql approach.
- Register table as hive external table with hbase storage handler and you can use hive on spark from hivecontext. It is also easy way.
Ex :
CREATE TABLE users(
userid int, name string, email string, notes string)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" =
”small:name,small:email,large:notes”);
How to do that please see as an example
I would prefer approach 1.
Hope that helps...

Community
- 1
- 1

Ram Ghadiyaram
- 28,239
- 13
- 95
- 121
-
Thanks for your answers.It is more help for me.I tried second approach means HBaseStorageHandler tables,i am not able to connect using HiveContext.Can you please tell me how to create context object for this type of tables.It throws ClassNotFoundException.Is there any configurations required. – user6608138 Sep 22 '16 at 13:23
-
I hope you have not used ** . if you used then remove .all * characters . I edited my answer as well. reg. hivecontext val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) is the way. – Ram Ghadiyaram Sep 22 '16 at 13:27
-
CREATE TABLE test.sample(id string,name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,details:name") TBLPROPERTIES ("hbase.table.name" = "sample"); Starting Spark shell: spark-shell --master local[2] – user6608138 Sep 23 '16 at 07:37
-
In spark-shell: val sqlContext=new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql(“select count(*) from test.sample”).collect() java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes – user6608138 Sep 23 '16 at 07:37
-
I am doing like this.I added this setting in hadoop-env.sh as export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/* but I am getting that NoClassDefFoundException.What is my mistake.please suggest me. – user6608138 Sep 23 '16 at 07:39
-
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:` hbase classpath ` I added space before single quote (before and after hbase classpath ) you remove that and try – Ram Ghadiyaram Sep 23 '16 at 08:15
-