Number of input splits is equals to number of mappers?

Question

I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?

score 0 · Answer 1 · edited Dec 21 '15 at 12:35

0

Number of splits=Number of mappers.

So if your file size is 1GB (1024/64) you will have 16 mappers running.

Your input split is different from the block size. Block is a physical representation that contains the actual data but input split is just a logical representation which just contains the split length and the split location.

However number of mappers also depends on various factors.

If your file is compressed which in turn is not a splittable format, then you will end up with one mapper processing the whole file.
If issplittable() in Inputformat class is set to false, then your file is not splittable and then also you will have one mapper running.
Reducers have to be set explicitly in the driver code. job.setNumReduceTasks() will do that. If not set then the number of reducers would be 1 by default.

I think the number of input splits depends upon the input file size.

edited Dec 21 '15 at 12:35

Manjunath Ballur

6,287
3
37
48

answered Oct 07 '15 at 12:06

Vignesh I

2,211
2
20
40

what about reducers ? – koti developer Oct 07 '15 at 12:07
@kotideveloper edited with your comments reg. inputsplit – Vignesh I Oct 07 '15 at 12:23
I think no of input splits is depends up on the our input file size – koti developer Oct 07 '15 at 12:27
I meant about how is that different from block size. Suppose if your block size is 128 Mb then you will have only 8 mappers running for your 1 GB file instead of 16. So number of input split is also dependent on the block size. – Vignesh I Oct 07 '15 at 12:37
@kotideveloper Does that answer your question? – Vignesh I Oct 08 '15 at 05:01
Yes Thank you for your support and i have one more doubt about the oozie – koti developer Oct 08 '15 at 06:45
@kotideveloper You could accept the answer . And for oozie post that as a separate question with appropriate tag. – Vignesh I Oct 08 '15 at 06:51

score -1 · Answer 2 · answered Oct 08 '15 at 12:41

No of blocks = NO of the Mappers; If the only one file with the 1 GB size and block size of 64 MB, no of chunks(Blocks) => 1026 MB/64 MB = 16 . So no of mappers = 16. By default we will get only one Reducer, if we want to run more reducers you can set job.setNumReduceTasks();

Number of input splits is equals to number of mappers?

2 Answers2