1

I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?

2 Answers2

0
Number of splits=Number of mappers.

So if your file size is 1GB (1024/64) you will have 16 mappers running.

Your input split is different from the block size. Block is a physical representation that contains the actual data but input split is just a logical representation which just contains the split length and the split location.

However number of mappers also depends on various factors.

  1. If your file is compressed which in turn is not a splittable format, then you will end up with one mapper processing the whole file.
  2. If issplittable() in Inputformat class is set to false, then your file is not splittable and then also you will have one mapper running.
  3. Reducers have to be set explicitly in the driver code. job.setNumReduceTasks() will do that. If not set then the number of reducers would be 1 by default.

I think the number of input splits depends upon the input file size.

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
Vignesh I
  • 2,211
  • 2
  • 20
  • 40
-1

No of blocks = NO of the Mappers; If the only one file with the 1 GB size and block size of 64 MB, no of chunks(Blocks) => 1026 MB/64 MB = 16 . So no of mappers = 16. By default we will get only one Reducer, if we want to run more reducers you can set job.setNumReduceTasks();