0

I am trying to connect to HDFS which is in VM with Ubuntu by using python jupyter tool from windows10. Can anybody help me with the below connection error am getting. Thank you.

Package used: pywebhdfs ubuntu 18.0.4 windows 10

'''

from pywebhdfs.webhdfs import PyWebHdfsClient
from pprint import pprint

HDFS_CONNECTION = PyWebHdfsClient(host='localhost',port='9000', user_name='root-sai')

HDFS_CONNECTION.list_dir('hdfs"//localhost:9000/New')

''' Error:-

ConnectionError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: /webhdfs/v1/hdfs%22//localhost%3A9000/New?op=LISTSTATUS&user.name=root-sai (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000250AB1FF438>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
user4157124
  • 2,809
  • 13
  • 27
  • 42

1 Answers1

2

The webhdfs port is not same as the RPC port. By default, it is 50070.

If webhdfs is not enabled (by default, this is enabled), add this property in hdfs-site.xml

<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>

You can test whether webhdfs is enabled by invoking a curl request.

Testing whether the /tmp directory exists, update the value of user.name as required.

curl -i "http://localhost:50070/webhdfs/v1/tmp?user.name=hadoop-user&op=GETFILESTATUS"

Initialize the PyWebHdfsClient,

HDFS_CONNECTION = PyWebHdfsClient(host='localhost',port='50070', user_name='root-sai')

HDFS_CONNECTION.list_dir('/New')
franklinsijo
  • 17,784
  • 4
  • 45
  • 63
  • appreciating your help. After making the changes as above i got an error.. {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File does not exist: /tmp"}} – Harsha Ragyari Apr 08 '20 at 06:43
  • 1
    I believe you got that error when trying to do `curl`. You can change the directory from `/tmp` to something else which you know as exists. – franklinsijo Apr 08 '20 at 06:47
  • yes...''' curl -i "http://localhost:50070/webhdfs/v1//usr/local/Hadoop/etc/hadoop?user.name=root-sai&op=GETFILESTATUS" ''' . error is same file doesnt exist..please help..This is of high value for my career. – Harsha Ragyari Apr 08 '20 at 07:01
  • 1
    You should be passing a valid directory in hdfs. Assuming you have a directory `/New` in hdfs, try `curl -i localhost:50070/webhdfs/v1/New?user.name=hadoop-user&op=GETFILESTATUS` – franklinsijo Apr 08 '20 at 07:13
  • i changed it New folder which is already exists in hdfs.. but the same error I got.. I really wish u could connect with me through my facebook-Harsha Ragyari to solve this – Harsha Ragyari Apr 08 '20 at 07:29