Hdfs delegation token can't be found in cache - error in Spark application

Question

I have simple Spark Streaming application in Spark version 2.3.0, which puts results of each processed batch on HDFS. My application is running on YARN in deploy-mode client and I am using kerberized hadoop cluster (hadoop2.6.0-cdh5.9.3). I have set --principal and --keytab in spark-submit command.

After a few days my application can't write on HDFS because of missing delegation token in cache. After restart application, streaming works correctly, but after a few days fails again for the same reason.

This is log from driver:

ERROR JobScheduler: Error running job streaming job 1528366650000 ms.0
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for spark_online: HDFS_DELEGATION_TOKEN owner=spark@DCWP, renewer=yarn, realUser=, issueDate=1528232733578, maxDate=1528837533578, sequenceNumber=14567778, masterKeyId=1397) can't be found in cache

Problem can be solved when I add to app configuration spark.hadoop.fs.hdfs.impl.disable.cache=true but disabling cache has big impact on processing performance.

If anyone could help me, I would really appreciate!

score 0 · Answer 1 · answered Apr 25 '19 at 19:36

It is likely that your kerberos ticket will need to get refreshed (which is why it will work when you restart it).

Lifetime of Kerberos tickets has a pretty decent walkthrough on the two settings in particular that you'll have to look at.

Option1: Setting the lifetimes to a more lengthy time
Option2: Have a second process that will just kinit in the background whenever you need it to

I prefer Option1 and use 30 days or so. It has been a nice way to keep track of 'when was the last time I restarted that service'.

Hdfs delegation token can't be found in cache - error in Spark application

1 Answers1