We have a full HDFS backup using distcp
that takes a long time to run, some of the data on HDFS is "moving", that is it is created and deleted. This results in mappers failing with java.io.FileNotFoundException: No such file or directory
. Such files are unimportant, we just want the backup to do the best it can.
Now it seems that -i
"ignore failures" is not quite what we want because it will ignore at the map level rather than the file level, that is if a map task fails all files associated to that map task will be ignored. What we want is just that file to be ignored.