I have this query. lets say i have 3 datanode+nodemanager(clusters). we have replication factor of 3. At first cluster we got 4 blocks, so by default 4 mappers will run parallelly on first cluster. then as we have replication factor of 3, we ll have 12 mappers running in beginning ?
1 Answers
Number of block depends on file size. If you have 1gb of file that makes 8 blocks (of 128 mb).
So now all 8 blocks will be replicated three times by following data locality and rack awareness - but it doesn't mean all 24 (8 x 3) blocks will be processed when you run any job against this file. Replication is for to recover from disk failures type of scenarios.
So to answer your questions:
Number of mappers = number of input splits(in most cases number of blocks).
There will be only 8 mappers running on cluster. Hadoop will decide on which node these mappers need to be run based on data locality - at closest block location in cluster(node).
There will be different case if speculative execution is enabled for the cluster - hadoop-speculative-task-execution

- 1
- 1

- 3,819
- 1
- 16
- 29