I have simple Spark Streaming application in Spark version 2.3.0, which puts results of each processed batch on HDFS. My application is running on YARN in deploy-mode client and I am using kerberized hadoop cluster (hadoop2.6.0-cdh5.9.3). I have set --principal and --keytab in spark-submit command.
After a few days my application can't write on HDFS because of missing delegation token in cache. After restart application, streaming works correctly, but after a few days fails again for the same reason.
This is log from driver:
ERROR JobScheduler: Error running job streaming job 1528366650000 ms.0
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for spark_online: HDFS_DELEGATION_TOKEN owner=spark@DCWP, renewer=yarn, realUser=, issueDate=1528232733578, maxDate=1528837533578, sequenceNumber=14567778, masterKeyId=1397) can't be found in cache
Problem can be solved when I add to app configuration spark.hadoop.fs.hdfs.impl.disable.cache=true but disabling cache has big impact on processing performance.
If anyone could help me, I would really appreciate!