Scala Spark / Shark: How to access existing Hive tables in Hortonworks?

Question

I am trying to find some docs / description of the approach on the subject, please help. I have Hadoop 2.2.0 from Hortonworks installed with some existing Hive tables I need to query. Hive SQL works extremly and unreasonably slow on single node and cluster as well. I hope Shark will work faster.

From Spark/Shark docs I can not figure out how to make Shark work with existing Hive tables. Any ideas how to achieve this? Thanks!

score 0 · Answer 1 · edited May 23 '17 at 10:33

You need to configure the metastore within the shark-specific hive directory. Details are provided at a similar question I answered here.

In summary, you will need to copy the hive-default.xml to hive-site.xml . Then ensure the metastore properties are set.

Here is the basic info in hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://myhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

You can get more details here: configuring hive metastore

Scala Spark / Shark: How to access existing Hive tables in Hortonworks?

1 Answers1