How to connect to a kerberoized hdfs from Spark on Kubernetes?

Question

I'm trying to connect to hdfs which is kerberized which fails with the error

org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

What additional parameters do I need to add while creating the spark setup apart from the standard thing that you need to spawn Spark worker containers?

score 1 · Answer 1 · answered Jan 31 '19 at 14:17

1

Check <property>hadoop.security.authentication<property> in your hdfs-site.xml properties file.
In your case it should have value kerberos or token.
Or you can configure it from code by specifying property explicitly:

Configuration conf = new Configuration();
conf.set("hadoop.security.authentication", "kerberos");

You can find more information about secure connection to hdfs here

answered Jan 31 '19 at 14:17

ruslangm

674
7
19

Thanks @ruslangm for you answer. I did try adding that parameter to the configuration. But it didn't work, still getting the same error. Also tried to give the keytab file and the principal, that to didnt work. – Alok Gogate Feb 01 '19 at 06:20
@AlokGogate It seems, I was mistaken, you need to change this property in **core-site.xml** not **hdfs-site.xml**. Can you try this again and report results? – ruslangm Feb 01 '19 at 10:03
Made changes in those files, read the files in my spark containers, yet facing the same problem. My spark runs on kubernetes and not in Yarn Mode – Alok Gogate Feb 01 '19 at 11:44

K.Naga · Answer 2 · 2019-02-07T15:13:58.687

I have also asked a very similar question here.

Firstly, please verify whether this is error is occurring on your driver pod or the executor pods. You can do this by looking at the logs of the driver and the executors as they start running. While I don't have any errors with my spark job running only on the master, I do face this error when I summon executors. The solution is to use a sidecar image. You can see an implementation of this in ifilonenko's project, which he referred to in his demo.

The premise of this approach is to store the delegation token (obtained by running a kinit) into a shared persistent volume. This volume can then be mounted to your driver and executor pods, thus giving them access to the delegation token, and therefore, the kerberized hdfs. I believe you're getting this error because your executors currently do not have the delegation token necessary for access to hdfs.

P.S. I'm assuming you've already had a look at Spark's kubernetes documentation.

How to connect to a kerberoized hdfs from Spark on Kubernetes?

2 Answers2