0

I want to walk through a given hdfs path recursively in Pyspark without using hadoop fs -ls [path]. I tried the solution suggested here, but found that listStatus() only returns me the status of the first sub-directory in the given path. According to this documentation, listStatus should return "the statuses of the files/directories in the given path if the path is a directory." What am I missing?

I'm using Hadoop 2.9.2, Spark 2.3.2 and Python 2.7.

largecats
  • 195
  • 1
  • 14

1 Answers1

0

I couldn't exactly recreate the scenario, but I think it has something to do with the fact that if a path is not a directory, listStatus() on that path will return a list of length 1 containing only the path itself.

largecats
  • 195
  • 1
  • 14