What do I need to do to have smaller/larger blocks in Hadoop?
Concretely, I want to have larger number of mappers, that gets smaller piece of data to work on. It seems that I need to decrease the block size, but I'm confused (I'm new to Hadoop) - do I need to do something while putting the file on HDFS, or do I need to specify something related to input split size, or both?
I'm sharing the cluster, so I cannot perform global settings, so need this on a per-job basis, if possible? And I'm running the job from code (later from Oozie, possibly).