5

Hi i am using pywebhdfs python lib. i am connecting EMR by calling and trying to create file on HDFS. I am getting below exception which seems irrelevant against what i am performing as i am not hitting any connection limit here. is it due to how webhdfs works

from pywebhdfs.webhdfs import PyWebHdfsClient
hdfs = PyWebHdfsClient(host='myhost',port='50070', user_name='hadoop')
my_data = '01010101010101010101010101010101'
my_file = 'user/hadoop/data/myfile.txt'
hdfs.create_file(my_file, my_data)

throws:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='masterDNS', port=50070): Max retries exceeded with url: /webhdfs/v1/user/hadoop/data/myfile.txt?op=CREATE&user.name=hadoop (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 115] Operation now in progress',))

ahajib
  • 12,838
  • 29
  • 79
  • 120
Sam
  • 1,333
  • 5
  • 23
  • 36
  • Perhaps this may be of some assistance: https://translate.google.com/translate?hl=en&sl=zh-CN&u=http://91r.net/ask/34259099.html&prev=search. Seems to be regarding entering separate host entries to enable getting round the same URL issue causing the exception. Not a great solution to the core issue, but it may help you get round it, at least. – ManoDestra Mar 15 '16 at 16:45

4 Answers4

0

I had this issue as well. I found that for some reason the call to:

send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):

is passed a timeout of 0, and that causes send to throw a

MaxRetryError

Bottom line, I found if you just set timeout = 1, it works fine:

hdfs = PyWebHdfsClient(host='yourhost', port='50070', user_name='hdfs', timeout=1)

Hope this works for you as well.

Greg
  • 382
  • 1
  • 5
  • 14
0

Formatting the namenode solved this problem for me several times.

hdfs namenode -format
Angelo Di Donato
  • 1,093
  • 9
  • 11
0

Please check the status of your connection. Run below command to see if the webhdfs port works from your host:

netstat -an | grep 50070 | grep LIST

Please note:

  • If SSL is enabled then port would be 50470.
  • hdfs namenode -format should not run from the node because it formats your namenode and you loose everything.
fcdt
  • 2,371
  • 5
  • 14
  • 26
Amit
  • 1
-1

maybe, webhdfs service is not running on the host that you specify. you may check your cluster to see which host is running webhdfs service.