1

The following are the versions that we have

Spark 1.6.1 Hadoop 2.6.2 Hive 1.1.0

I have the hive-site.xml in $SPARK_HOME/conf directory. The hive.metastore.uris property is also configured properly.

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://host.domain.com:3306/metastore</value>
    <description>metadata is stored in a MySQL server</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>MySQL JDBC driver class</description>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>user name for connecting to mysql server </description>
</property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>*****</value>
    <description>password for connecting to mysql server </description>
</property>

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://host.domain.com:9083</value>
    <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

Unfortunately Spark is creating a temp derby db without connecting to MySQL metastore

I need Spark to connect to MySQL metastore as that is the central store for all metadata. Please help

Regards

Bala

Balaji Krishnan
  • 437
  • 8
  • 27
  • Side note: if `hive.metastore.uris` is present and valid then Spark should use it to connect to the Metastore service, and ignore the `javax.jdo.*` stuff (these are used only by the Metastore service itself - and it's even a security breach to pass these to client processes!) – Samson Scharfrichter Mar 16 '17 at 17:43
  • To check whether the property was picked by the Hadoop libraries, try `sc.hadoopConfiguration.get("hive.metastore.uris","(undefined)")` _[disclaimer - I can't test that right now, just picked it from the Spark and Hadoop docs]_ – Samson Scharfrichter Mar 16 '17 at 17:52
  • 1
    When accessing a regular Hive Metastore, the Hive config file should be either in `$HADOOP_CONF_DIR` or in any directory that's in the CLASSPATH (e.g. `--conf spark.driver.extraClasspath=/etc/hive/conf`) – Samson Scharfrichter Mar 16 '17 at 17:55
  • thanks @SamsonScharfrichter for the help – Balaji Krishnan Mar 17 '17 at 04:37

1 Answers1

2

Can you try passing the hive-site.xml (--files) with spark-submit when running in cluster mode?

Sanket_patil
  • 301
  • 1
  • 10
  • 1
    thanks @Sanket_patil for the help. I still see the derby.log in the run directory but Spark code does create the object in Hive. thank you – Balaji Krishnan Mar 17 '17 at 04:37