Questions tagged [pyhdfs-client]

Use this tag for questions relevant to HDFS client, written in Python.

13 questions
6
votes
1 answer

How to import "HdfsClient" in python 3?

I'm new in python & I'm trying to connect with Hadoop HDFS system. I got the following reference code as which I tried to implement it, but it's showed error while importing the package. from pyarrow import HdfsClient # Using libhdfs hdfs =…
David
  • 366
  • 3
  • 22
5
votes
4 answers

ConnectionError(MaxRetryError("HTTPConnectionPool Max retries exceeded using pywebhdfs

Hi i am using pywebhdfs python lib. i am connecting EMR by calling and trying to create file on HDFS. I am getting below exception which seems irrelevant against what i am performing as i am not hitting any connection limit here. is it due to how…
Sam
  • 1,333
  • 5
  • 23
  • 36
1
vote
0 answers

How to use hdfscli python library?

I have following use case, I wanted to connect a remote hadoop cluster. So, I got all the hadoop conf files (coresite.xml, hdfs-site.xml and others) and stored it in one directory in local file system. I got the correct keytab and krb5.conf file for…
Neil
  • 11
  • 2
1
vote
0 answers

Apache Arrow connectivity issue with HDFS (Remote file-system)

I want to connect pyarrow to read to and write parquet file in hdfs But I am facing some connectivity issue I installed pyarrow and python pandas now I am trying to connect with hdfs in remote machine Reference link -…
UDIT JOSHI
  • 1,298
  • 12
  • 26
1
vote
1 answer

Pyhdfs copy_from_local causing nodename nor servname provided, or not known error

I am using the following python code to upload a file to remote HDFS from my local system using pyhdfs from pyhdfs import HdfsClient client =…
Sunil Rao
  • 800
  • 2
  • 6
  • 23
0
votes
0 answers

how I can connect project django with HDFS?

I want to connect my project django with HDFS to storage data I want to connect my project django with HDFS to storage data, how can I do that
0
votes
1 answer

Writing to kerberosed hdfs using python | Max retries exceeded with url

I am trying to use python to write to secure hdfs using the following lib link Authentication part: def init_kinit(): kinit_args = ['/usr/bin/kinit', '-kt', '/tmp/xx.keytab', 'kerberos_principle'] subp = Popen(kinit_args,…
0
votes
1 answer

Can pyhdfs make a 'soft' delete?

I am using from pyhdfs import HdfsClient fs = HdfsClient(hosts=..., user_name='hdfs', ..) fs.delete(path_table, recursive=True) However, after I deleted the directory, I could not find it in the trash directory located in…
user2894829
  • 775
  • 1
  • 6
  • 26
0
votes
1 answer

How can I get passed Connection error in pywebhfds?

I have a locally single-node hosted hadoop. my name and datanode are same. I'm trying to create a file using python library. self.hdfs = PyWebHdfsClient(host='192.168.231.130', port='9870', user_name='kush', …
Kush Singh
  • 157
  • 3
  • 11
0
votes
0 answers

Spark: parallelize hdfs URLs with data locality awarness

I have a list of HDFS zip file URLs and I want to open the each file inside RDD map function instead of using binaryFiles function. Initially, I tried like below: def unzip(hdfs_url): # read the hdfs file using hdfs python client rdd =…
gunturu mahesh
  • 113
  • 4
  • 9
0
votes
1 answer

How to save incoming file in bottle api to hdfs

I am defining bottle api where I need to accept a file from the client and then save that file to HDFS on the local system. The code looks something like this. @route('/upload', method='POST') def do_upload(): import pdb; pdb.set_trace() …
Keyur Golani
  • 573
  • 8
  • 26
0
votes
0 answers

Python HDFS : Cannot parse json document

I am following the simple piece of code from the documentation http://hdfscli.readthedocs.org/en/latest/quickstart.html with client.read(path, encoding='utf-8') as reader: print reader from json import load model =…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
-1
votes
3 answers

Remove the tuple and create a new sorted list

I have a RDD which I created using PySpark and sizes around 600 GB after joining by key value which looks exactly like this. [('43.72_-70.08', (('0744632', -70.08, 43.72, '2.4'), '18090865')), ('43.72_-70.08', (('0744632', -70.08, 43.72, '2.4'),…
Sami
  • 29
  • 1
  • 5