0

I have twitter data stored in hdfs path. I am able to read the data with spark dataframe as:

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

val df= hiveContext.read.json("/nifi/data/twitter/")

df.printSchema and df.show commands show the result without any issue.

but when I am trying to store the data frame to hive table, I am facing below errors:

df.write.saveAsTable("tweets_32")

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /apps/hive/warehouse/tweets_32/_temporary/0/_temporary/attempt_201809260508_0002_m_000002_0/part-r-00002-c204b592-dc2a-4b2f-bc39-54afb237a6cb.gz.parquet (inode 1173647): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_14557453_1, pendingcreates: 1]>

Could someone let me know,what could be the reason for this?

pheeleeppoo
  • 1,491
  • 6
  • 25
  • 29
  • 1
    I am not sure of the error but below are the points that are coming to my mind: try and use sparksession instead of hivecontext. Sparksession object encapsulates hivecontext & sqlcontext – Prashant Sep 26 '18 at 11:32

1 Answers1

0

The meaning of this error: another program has processed and deleted this tmp file. Check, that there is no another task is running in parallel to yours. Another cause - your task might be slow. Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them. You may try to fix it by close the speculative of spark and Hadoop:

sparkConf.set("spark.speculation", "false");
sparkConf.set("spark.hadoop.mapreduce.map.speculative", "false");
sparkConf.set("spark.hadoop.mapreduce.reduce.speculative", "false");

There is a thread with this issue discussion: enter link description here

Oleg Hmelnits
  • 114
  • 1
  • 10