I am running a parser file to parse files from .txt files from a local directory. Now these files are moved to HDFS cluster and I would like to configure my Pycharm to access the HDFS cluster. Can someone assist me in doing this?
Asked
Active
Viewed 533 times
0
-
What Python Hadoop library are you using? You will need one to access those files as `open()` will not work – OneCricketeer Jul 14 '18 at 03:58
-
1Possible duplicate of [Python read file as stream from HDFS](https://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs) – OneCricketeer Jul 14 '18 at 04:01
1 Answers
0
I would like to configure my Pycharm to access the HDFS cluster
Depends on what type of access you're referring to. As far as the HDFS CLI basics, you can do that with os
# Not tested
import os
import sys
f = "{}/tmp.txt".format(os.getcwd())
cmds = [
"touch {}".format(f),
"hadoop fs -copyFromLocal {} /user/$USER/".format(f),
"rm -fv {}".format(f),
"hadoop fs -copyToLocal /user/$USER/tmp.txt $PWD/",
]
for cmd in cmds:
os.system(cmd)
assert os.path.exists(f)
But if you're looking for more granular control you'll want something like pyarrow (or the like)

semore_1267
- 1,327
- 2
- 14
- 29
-
-
No because we copy the file back with `"hadoop fs -copyToLocal /user/$USER/tmp.txt $PWD/"` – semore_1267 Jul 15 '18 at 17:48
-