0

I am trying to connect to an HDFS Cluster using python code, library(snakebite-py3) and I see that when I set use_sasl to True I am getting the following error:

Code Snippet:

from snakebite.client import Client

client = Client(host='hostname', port=8020, 
                effective_user='user', use_sasl=True)

for x in client.ls(['/']): 
   print(x,"\n")

Error:

---------------------------------------------------------------------------
GSSError                                  Traceback (most recent call last)
<ipython-input-21-62c8b8df16ea> in <module>
      2 from snakebite.client import Client
      3 
----> 4 client = Client(host='hostname',port=8020, effective_user='user', use_sasl=True)
      5 
      6 for x in client.ls(['/test_abha']): print(x,"\n")

C:\ProgramData\Anaconda3\lib\site-packages\snakebite\client.py in __init__(self, host, port, hadoop_version, use_trash, effective_user, use_sasl, hdfs_namenode_principal, sock_connect_timeout, sock_request_timeout, use_datanode_hostname)
    126         self.hdfs_namenode_principal = hdfs_namenode_principal
    127         self.service_stub_class = client_proto.ClientNamenodeProtocol_Stub
--> 128         self.service = RpcService(self.service_stub_class, self.port, self.host, hadoop_version,
    129                                   effective_user,self.use_sasl, self.hdfs_namenode_principal,
    130                                   sock_connect_timeout, sock_request_timeout)

C:\ProgramData\Anaconda3\lib\site-packages\snakebite\service.py in __init__(self, service_stub_class, port, host, hadoop_version, effective_user, use_sasl, hdfs_namenode_principal, sock_connect_timeout, sock_request_timeout)
     30 
     31         # Setup the RPC channel
---> 32         self.channel = SocketRpcChannel(host=self.host, port=self.port, version=hadoop_version,
     33                                         effective_user=effective_user, use_sasl=use_sasl,
     34                                         hdfs_namenode_principal=hdfs_namenode_principal,

C:\ProgramData\Anaconda3\lib\site-packages\snakebite\channel.py in __init__(self, host, port, version, effective_user, use_sasl, hdfs_namenode_principal, sock_connect_timeout, sock_request_timeout)
    193                 raise FatalException("Kerberos libs not found. Please install snakebite using 'pip install snakebite[kerberos]'")
    194 
--> 195             kerberos = Kerberos()
    196             self.effective_user = effective_user or kerberos.user_principal()
    197         else:

C:\ProgramData\Anaconda3\lib\site-packages\snakebite\kerberos.py in __init__(self)
     41 class Kerberos:
     42     def __init__(self):
---> 43         self.credentials = gssapi.Credentials(usage='initiate')
     44 
     45     def user_principal(self):

C:\ProgramData\Anaconda3\lib\site-packages\gssapi\creds.py in __new__(cls, base, token, name, lifetime, mechs, usage, store)
     61             base_creds = rcred_imp_exp.import_cred(token)
     62         else:
---> 63             res = cls.acquire(name, lifetime, mechs, usage,
     64                               store=store)
     65             base_creds = res.creds

C:\ProgramData\Anaconda3\lib\site-packages\gssapi\creds.py in acquire(cls, name, lifetime, mechs, usage, store)
    134 
    135         if store is None:
--> 136             res = rcreds.acquire_cred(name, lifetime,
    137                                       mechs, usage)
    138         else:

gssapi/raw/creds.pyx in gssapi.raw.creds.acquire_cred()

GSSError: Major (851968): Unspecified GSS failure.  Minor code may provide more information, Minor (39756044): Credential cache is empty

Please kindly suggest, thank you.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
nikhil int
  • 181
  • 2
  • 14
  • 1
    `Credential cache is empty` sounds like you're maybe using Kerberos? If so, you'll need to `kinit` first before running your Python process – OneCricketeer Feb 04 '22 at 15:31
  • Hi @OneCricketeer, I do not follow can you mention how we can make a connection to an HDFS cluster, say with Kerberos auth mechanism using SNAKEBITE-PY3 library in python, please ... Thanks :) – nikhil int Feb 08 '22 at 05:46
  • 1
    You're using Kerberos, right? `kinit` is an external command that updates your "credential cache" that the internal `gssapi` module that snakebite uses will read. Talk to your Hadoop administrator about getting Kerberos credentials/keytabs – OneCricketeer Feb 08 '22 at 15:12

0 Answers0