I want to walk through a given hdfs path recursively in Pyspark without using hadoop fs -ls [path]
. I tried the solution suggested here, but found that listStatus()
only returns me the status of the first sub-directory in the given path. According to this documentation, listStatus
should return "the statuses of the files/directories in the given path if the path is a directory." What am I missing?
I'm using Hadoop 2.9.2, Spark 2.3.2 and Python 2.7.