7

How do you set a hive property like: hive.metastore.warehouse.dir at runtime? Or at least a more dynamic way of setting a property like the above, than putting it in a file like spark_home/conf/hive-site.xml

hbogert
  • 4,198
  • 5
  • 24
  • 38
  • You can use `set ;` inside your `.hql` query. – o-90 Sep 15 '15 at 13:31
  • I've tried this, however this does not seem to have effect. Also setting sqlContext.setConf("hive.metastore.warehouse.dir", "/path") does not work – hbogert Sep 15 '15 at 13:48
  • I can see that spark-sql actually has a `--hiveconf` parameter. This is not available in spark-shell – hbogert Sep 15 '15 at 14:03
  • @hbogert were you able to resolve this problem? I'm encountering someting similar: http://stackoverflow.com/questions/37061544/hive-configuration-for-spark-integration-tests – Sim May 06 '16 at 02:21
  • hi @hbogert did you have the chance to try my suggestion below? I was curious if that worked for you – abiratsis Feb 28 '19 at 17:39
  • hi @AlexandrosBiratsis I'm afraid I am no longer able to verify that. this question is 3 and half years old :) I do vaguely remember that the exact 'hive.metastore.warehouse.dir' was problematic, but others worked fine when trying to set them at runtime. So to err on the safe side, I think I cannot accept your answer at this point. Can you verify if you can actually change `hive.metastore.warehouse.dir` ? – hbogert Feb 28 '19 at 20:51
  • hi @hbogert I am sorry I didn't provide earlier details about my attempts. But yes it worked as described below. I updated my post with the latest details. Please keep in mind that I am using Spark 2.4.0 – abiratsis Mar 03 '19 at 18:50

1 Answers1

5

I faced the same issue and for me it worked by setting Hive properties from Spark (2.4.0). Please find below all the options through spark-shell, spark-submit and SparkConf.

Option 1 (spark-shell)

spark-shell --conf spark.hadoop.hive.metastore.warehouse.dir=some_path\metastore_db_2

Initially I tried with spark-shell with hive.metastore.warehouse.dir set to some_path\metastore_db_2. Then I get the next warning:

Warning: Ignoring non-spark config property: hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2

Although when I create a Hive table with:

bigDf.write.mode("overwrite").saveAsTable("big_table")

The Hive metadata are stored correctly under metastore_db_2 folder.

When I use spark.hadoop.hive.metastore.warehouse.dir the warning disappears and the results are still saved in the metastore_db_2 directory.

Option 2 (spark-submit)

In order to use hive.metastore.warehouse.dir when submitting a job with spark-submit I followed the next steps.

First I wrote some code to save some random data with Hive:

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

val sparkConf = new SparkConf().setAppName("metastore_test").setMaster("local")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()

import spark.implicits._
var dfA = spark.createDataset(Seq(
      (1, "val1", "p1"),
      (2, "val1", "p2"),
      (3, "val2", "p3"),
      (3, "val3", "p4"))).toDF("id", "value", "p")

dfA.write.mode("overwrite").saveAsTable("metastore_test")

spark.sql("select * from metastore_test").show(false)

Next I submitted the job with:

spark-submit --class org.tests.Main \
        --conf spark.hadoop.hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2 
        spark-scala-test_2.11-0.1.jar 

The metastore_test table was properly created under the C:\winutils\hadoop-2.7.1\bin\metastore_db_2 folder.

Option 3 (SparkConf)

Via SparkSession in the Spark code.

val sparkConf = new SparkConf()
      .setAppName("metastore_test")
      .set("spark.hadoop.hive.metastore.warehouse.dir", "C:\\winutils\\hadoop-2.7.1\\bin\\metastore_db_2")
      .setMaster("local")

This attempt was successful as well.

The question which still remains is why I have to extend the property with spark.hadoop in order to work as expected?

abiratsis
  • 7,051
  • 3
  • 28
  • 46
  • 2
    answer to your last question: https://github.com/apache/spark/blob/v2.3.1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L501 – Yordan Georgiev May 02 '19 at 12:46