4

We have a full HDFS backup using distcp that takes a long time to run, some of the data on HDFS is "moving", that is it is created and deleted. This results in mappers failing with java.io.FileNotFoundException: No such file or directory. Such files are unimportant, we just want the backup to do the best it can.

Now it seems that -i "ignore failures" is not quite what we want because it will ignore at the map level rather than the file level, that is if a map task fails all files associated to that map task will be ignored. What we want is just that file to be ignored.

samthebest
  • 30,803
  • 25
  • 102
  • 142

0 Answers0