i have a spark streaming (2.1.1 with cloudera 5.12). with input kafka and output HDFS (in parquet format) the problem is , i'm getting LeaseExpiredException randomly (not in all mini-batch)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/qoe_fixe/data_tv/tmp/cleanData/_temporary/0/_temporary/attempt_20180629132202_0215_m_000000_0/year=2018/month=6/day=29/hour=11/source=LYO2/part-00000-c6f21a40-4088-4d97-ae0c-24fa463550ab.snappy.parquet (inode 135532024): File does not exist. Holder DFSClient_attempt_20180629132202_0215_m_000000_0_-1048963677_900 does not have any open files.
i'm using the dataset API for writing to hdfs
if (!InputWithDatePartition.rdd.isEmpty() ) InputWithDatePartition.repartition(1).write.partitionBy("year", "month", "day","hour","source").mode("append").parquet(cleanPath)
my job fails after few hours because of this error