0

I'm using wholeTextFiles to read a bunch of xml files from different folders and some of these folders might be empty. Unfortunately Spark throws an exception if any of these folders are empty:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern file:/path/*/*/*.xml matches 0 files

I've seen some ways of workaround this issue when dealing with regular RDDs, like this one, but I couldn't find anything similar when using wholeTextFiles.

I've looked a bit into Spark code and this method uses a bunch of private classes, so it seems hard to change the behaviour. Any ideas?

zero323
  • 322,348
  • 103
  • 959
  • 935
Luciano
  • 43
  • 1
  • 5

0 Answers0