1

What are the configurations required to access a kerberized HDFS from a pyspark application running in a remote spark cluster?

Here's my code

from pyspark import SparkConf, SparkContext

######
# Get fs handler from java gateway
######

# Create spark context
sc = SparkContext(appName="test-hdfs", conf=conf)

URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("hdfs://hadoop.com:8020"), sc._jsc.hadoopConfiguration())

fs.listStatus(Path('/hdfs/dir/'))

I keep running into the error below

Traceback (most recent call last):
  File "/path/to/file/file.py", line 22, in <module>
    fs.listStatus(Path('/hdfs/dir/'))
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o28.listStatus.
: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2088)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069)
Nakx
  • 1,460
  • 1
  • 23
  • 32
Bhummy
  • 11
  • 1
  • Recommended reading: https://stackoverflow.com/questions/42650562/access-a-secured-hive-when-running-spark-in-an-unsecured-yarn-cluster and especially the comment by Steve Loughran, a master of the dark arts of Kerberos in the world of Hadoop. – Samson Scharfrichter Feb 14 '20 at 23:22

0 Answers0