I use CentOS on Cloudera QuickStart VM. I created a sbt-managed Spark application following the other question How to save DataFrame directly to Hive?.
build.sbt
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.5.2"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.2"
I'd like to use a DataFrame as a Hive table as follows:
recordDF.registerTempTable("mytempTable")
hiveContext.sql("create table productstore as select * from mytempTable");
I noticed that I am getting the error:
The root scratch dir: /tmp/hive should be writable. Current permissions are: rwx------
I followed other questions and set chmod 777
for the /tmp/hive
in HDFS.
It occurred to me know that spark as using local filesystem /tmp/hive.
I did a chmod for local filesystem.
Now I am getting error
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/productstore is not a directory or unable to create one)
I'd like to store a DataFrame in HDFS hive warehouse.