0

I'm currently in the process of setting up a Kerberized environment for submitting Spark Jobs using Livy in Kubernetes.

What I've achieved so far:

  • Running Kerberized HDFS Cluster
  • Livy using SPNEGO
  • Livy submitting Jobs to k8s and spawning Spark executors
  • KNIME is able to interact with Namenode and Datanodes from outside the k8s Cluster

To achieve this I used the following Versions for the involved components:

  • Spark 2.4.4
  • Livy 0.5.0 (The currently only supported version by KNIME)
  • Namenode and Datanode 2.8.1
  • Kubernetes 1.14.3

What I'm currently struggling with:

  • Accessing HDFS from the Spark executors

The error message I'm currently getting, when trying to access HDFS from the executor is the following:

org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "livy-session-0-1575455179568-exec-1/10.42.3.242"; destination host is: "hdfs-namenode-0.hdfs-namenode.hdfs.svc.cluster.local":8020;

The following is the current state:

  1. KNIME connects to HDFS after having successfully challenged against the KDC (using Keytab + Principal) --> Working
  2. KNIME puts staging jars to HDFS --> Working
  3. KNIME requests new Session from Livy (SPNEGO challenge) --> Working
  4. Livy submits Spark Job with k8s master / spawns executors --> Working
  5. KNIME submits tasks to Livy which should be executed by the executors --> Basically working
  6. When trying to access HDFS to read a file the error mentioned before occurs --> The problem

Since KNIME is placing jar files on HDFS which have to be included in the dependencies for the Spark Jobs it is important to be able to access HDFS. (KNIME requires this to be able to retrieve preview data from DataSets for example)

I tried to find a solution to this but unfortunately, haven't found any useful resources yet. I had a look at the code an checked UserGroupInformation.getCurrentUser().getTokens(). But that collection seems to be empty. That's why I assume that there are not Delegation Tokens available.

Has anybody ever achieved running something like this and can help me with this?

Thank you all in advance!

nomadSK25
  • 2,350
  • 3
  • 25
  • 36
denglai
  • 21
  • 6
  • Does Livy run inside k8s? If not, did you try to run Spark in "client mode" so that the Kerberos tokens are obtained from a _real_ server with a _canonical DNS name_ (as required by Kerberos), cf. https://www.back2code.me/2018/12/spark-on-kubernetes-client-mode/ – Samson Scharfrichter Dec 04 '19 at 13:55
  • Quoting my own post below, _" set the log level for `org.apache.spark.deploy.yarn.Client` to DEBUG"_ to get some insights about what Spark actually tries to achieve about these tokens -- https://stackoverflow.com/questions/44265562/spark-on-yarn-secured-hbase – Samson Scharfrichter Dec 04 '19 at 13:58
  • Also, `-Dsun.security.krb5.debug=true -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext` in the Spark client will give you insight about what the core JAAS libraries actually do when contacting the Kerberos KDC. – Samson Scharfrichter Dec 04 '19 at 14:01
  • In a nutshell, your question is not about "forwarding the token" but rather about "obtaining the token" in the fist place... – Samson Scharfrichter Dec 04 '19 at 14:03
  • Thank you for your response. Livy is running inside k8s as well and is submitting to k8s in client mode (`deploy-mode client`). I will give those debug flags a try. – denglai Dec 04 '19 at 14:08
  • OK, silly of me, client or cluster mode make no difference for tokens since they are obtained by the "Spark client" _(i.e. the launcher stub)_ which is always running on the Livy node. The only difference is that client mode is easier to debug since the driver (and its logs) are also on the Livy node, no need to chase them on volatile pods... – Samson Scharfrichter Dec 04 '19 at 14:52
  • I've added the debug flags as described above. From what I can see, I'd say that from the driver the access to hdfs seems to be working, but the executor remains silent when it comes to talking to kerberos. Is it necessary to submit using --keytab and --principal? Because currently the submit is being made using --proxy-user instead – denglai Dec 04 '19 at 17:13
  • Explicit "principal / keytab" are used by long-running jobs which must renew their Hadoop tokens every 24h or so -- and only the driver can do that, because the "client launcher" may have already terminated. That's not compatible with impersonation (i.e. "proxy-user") where the job has no idea that it was launched by a privileged account (and cannot have access to its credentials...) – Samson Scharfrichter Dec 04 '19 at 19:46

1 Answers1

2

For everybody struggeling with this: It took a while to find the reason on why this is not working, but basically it is related to Spark's Kubernetes implementation as of 2.4.4. There is no override defined for CoarseGrainedSchedulerBackend's fetchHadoopDelegationTokens in KubernetesClusterSchedulerBackend.

There has been the pull request which will solve this by passing secrets to executors containing the delegation tokens. It was already pulled into master and is available in Spark 3.0.0-preview but is not, at least not yet, available in the Spark 2.4 branch.

denglai
  • 21
  • 6