I'm using FileInputFormat.addInputPath to specify a path to the list of input files for my hadoop job. I've found that if I have x file in my input directory, x mappers will be started over the course of the whole job.
I was wondering if there is any way to specify which input files will correspond to some node, such that I can control which machine will operate on some set of input files.
The reason i'm doing this is because I'm working with a heterogenous cluster, and I want to balance the workload as evenly as possible.