Questions tagged [python-hdfs]

Use this tag for questions related to the Python package named HDFS.

8 questions
21
votes
5 answers

What's the best module for interacting with HDFS with Python3?

I see there is hdfs3, snakebite, and some others. Which one is the best supported and comprehensive?
Farhat
  • 1,203
  • 2
  • 12
  • 19
2
votes
1 answer

How do i set the path of libhdfs.so for pyarrow?

I'm trying to use pyarrow and i keep getting the following error. ImportError: Can not find the shared library: libhdfs3.so so i read some stackoverflow and it says that i need to set enviorment variable for ARROW_LIBHDFS_DIR. The path to…
Kush Singh
  • 157
  • 3
  • 11
1
vote
0 answers

How to use hdfscli python library?

I have following use case, I wanted to connect a remote hadoop cluster. So, I got all the hadoop conf files (coresite.xml, hdfs-site.xml and others) and stored it in one directory in local file system. I got the correct keytab and krb5.conf file for…
Neil
  • 11
  • 2
1
vote
0 answers

Writing JSON content to HDFS location using Python

I am trying to write JSON content to HDFS location using Python,but for every key and value in my JSON content, I am seeing prefix of u and ''. Original JSON content { "id": 2344556, "resource_type": "user", "ext_uid": null, "email":…
Rahul
  • 467
  • 1
  • 8
  • 24
0
votes
0 answers

remove only the file given in hdfs path and not the entire hdfs path

I am trying to delete the file 20221229_20230221-101756_Backtest_M.txt given in hdfs path : hdfs_path = '/dev/flux_entrant/depot/backtesting/' To do it, I am using : fs =…
user8810618
  • 115
  • 11
0
votes
1 answer

How can I get passed Connection error in pywebhfds?

I have a locally single-node hosted hadoop. my name and datanode are same. I'm trying to create a file using python library. self.hdfs = PyWebHdfsClient(host='192.168.231.130', port='9870', user_name='kush', …
Kush Singh
  • 157
  • 3
  • 11
0
votes
1 answer

Connect to HDFS with keytab of a serviceID with Python3.6

I am trying the below piece of code to connect to hdfs and do some file related operation. Please note I am trying to connect a Cloudera HDFS instance from a Centos7 environment with python3.6 installed into it. import io from csv import…
Shanit
  • 109
  • 1
  • 2
  • 7
0
votes
1 answer

in python hdfs Is there a way to use wildcard or regex in the list method?

In linux hadoop fs -ls I can use wildcard (/sandbox/*) but the pyhon hdfs client list method fails on this as an unknown path. Is there a different way to use wildcards in python-hdfs?
Ezer K
  • 3,637
  • 3
  • 18
  • 34