I am working on Hadoop and Spark Framework for clustering of images. I am using Python as my programming language.For map-reduce framework MRJOB package is used. The doubt i am having is how to access the hdfs files directly in python? For example if my file on hdfs is /a.txt now how do i access it in python directly to apply further processing. I looked at many libraries but i am not getting a concrete answer.I saw snakebite but it is only for python 2.
Asked
Active
Viewed 1,112 times
0
-
1Why not reading directly the file using Pyspark? An example: `sc.textFile("hdfs:///your_path_to/a.txt")` – NicolasKittsteiner Sep 10 '18 at 20:32
-
https://stackoverflow.com/a/51548097/2308683 – OneCricketeer Sep 10 '18 at 21:52