I am working on a spark program that will load the data into Hive tables and I am doing that on Spark Version: 2.0.2 Initially I did executed these two steps on spark-shell:
import org.apache.spark.sql.SparkSession
val spark = val spark = SparkSession.builder.master("local").appName("SparkHive").enableHiveSupport().config("hive.exec.dynamic.partition","true").config("hive.exec.dynamic.partition.mode","nonstrict").config("hive.metastore.warehouse.dir","/user/hive/warehouse").getOrCreate()
When I tried to load a dataset from HDFS into spark, I was getting an exception for the below line:
val partFile = spark.read.textFile("hdfs://quickstart:8020/user/cloudera/partfile")
Exception:
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx---
After doing some research online, I learnt that only HDFS has the permission to work on the /tmp/hive directory and not even the root user can operate on it. I tried the below command and as predicted, it didn't work.
hadoop fs -chmod -R 777 /tmp/hive/
But I performed these steps and I was able to load the file into Spark.
In hdfs, Remove the /tmp/hive directory ==> "hdfs dfs -rm -r /tmp/hive"
At OS level too, delete the dir /tmp/hive ==> rm -rf /tmp/hive
My question is, could anyone tell me if it is this the right way to fix the issue ?