I have following use case,
I wanted to connect a remote hadoop cluster. So, I got all the hadoop conf files (coresite.xml, hdfs-site.xml and others) and stored it in one directory in local file system. I got the correct keytab and krb5.conf file for kerberos authentication. I installed hadoop and placed the untar files under some directory, say /User/xyz/hadoop
.
I set the following env variables:
JAVA_HOME(), HADOOP_HOME, HADOOP_CONF_DIR
and finally placed my krb5.conf file under /etc/
. This setup helped me to successfully authenticate using kinit -kt <keytab> <principal user>
and perform hadoop commands like hadoop fs -ls /
from my local terminal and access the cluster.
However, I wanted to perform the same action without downloading hadoop. Is there a way? I am using python and came across this hdfs python library. However, I had hard time understanding and working with this lib.
- What I am trying to achieve, is it possible?
- If so, what is the right way?
- Can someone guide me to setup hdfscli lib with right configuration?