0

I'm trying out snakebite. I started the following client:

from snakebite.client import Client
client = Client("my.host.com", 8020, effective_user='datascientist')

First, I tried to list the users directory:

for x in client.ls(['/user/datascientist']):
    print x

This worked nicely and printed couple of dictionaries; one for each item in the directory. One of the items is a file foobar.txt which I'd like to see. To that end, I believe I should use Client.cat:

for cat in client.cat(['/user/datascientist/da-foobar.txt',]):
    print(cat)
    for item in cat:
        print(item)

However, this didn't work. I got the following error message:

ConnectionFailureException: Failure to connect to data node at (10.XXX.YYY.ZZZ:50010)

What am I doing wrongly?

BTW: using PyWebHdfsClient from pywebhdfs.webhdfs I managed to see the file by starting a client with the same address but with port 50070. I don't know whether this is relevant or not.

Edit 1: I also tried to use snakebite.client.Client.text and got the same error. I guess this is not surprising.

BTW, the file's content is my file is this\ntest file.

Dror
  • 12,174
  • 21
  • 90
  • 160

1 Answers1

0

I found a/the solution. It seems like the listing operation can be accomplished on the name-node alone. In contrast, the printing of the text file needs to access the data-nodes! By instantiating the client as follows

client = Client("stage-gap-namenode-2.srv.glispa.com", 8020, effective_user='datascientist', 
                use_datanode_hostname=True)

the cat operation works as it is not using the internal IP, but the hostname. I summarized a minimal example.

Dror
  • 12,174
  • 21
  • 90
  • 160