0

I am trying to use python to write to secure hdfs using the following lib link

Authentication part:

def init_kinit():
    kinit_args = ['/usr/bin/kinit', '-kt', '/tmp/xx.keytab',
                  'kerberos_principle']
    subp = Popen(kinit_args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    subp.wait()

client/upload part:

from hdfs.ext.kerberos import KerberosClient
client = KerberosClient(url='http://xx.com:port', session=session,
                            mutual_auth="REQUIRED")
client.upload(
        f'/hdfspath/file.parquet',
        f'/localpath/file.parquet')

here is the error

    requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxx', port=xxx): 
Max retries exceeded with url: /webhdfs/v1/user/xxx/xxx.parquet?op=LISTSTATUS (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f499c104d30>: Failed to establish a new connection: [Errno 111] Connection refused'))

I tried the following link and I made sure that dfs.webhdfs.enabled is enabled

1 Answers1

0

It turned out that we are using https policy, changes the port & protocol and it worked just fine

<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>

code

client = KerberosClient(url='https://xx.com:port', session=session,
                            mutual_auth="REQUIRED")