I'm using wholeTextFiles to read a bunch of xml files from different folders and some of these folders might be empty. Unfortunately Spark throws an exception if any of these folders are empty:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern file:/path/*/*/*.xml matches 0 files
I've seen some ways of workaround this issue when dealing with regular RDDs, like this one, but I couldn't find anything similar when using wholeTextFiles.
I've looked a bit into Spark code and this method uses a bunch of private classes, so it seems hard to change the behaviour. Any ideas?