A file is stored in HDFS of size 260 MB whereas the HDFS default block size is 64 MB. Upon performing a map-reduce job against this file, I found the number of input splits it creates is only 4. how did it calculated.? where is the rest 4 MB.? Any input is much appreciated.
Asked
Active
Viewed 1,177 times
1 Answers
1
Input split is NOT always a block size. Input split is a logical representation of data. Your input split could have been 63mb, 67mb, 65mb, 65mb(or possibly other sizes based on logical records' sizes) ... see examples in below links...

Ronak Patel
- 3,819
- 1
- 16
- 29
-
Suppose the logical records size in just few KB. let's say each line/record in the file is of 1 KB, then how many input splits it will generate.? – TheCodeCache Feb 13 '18 at 08:26
-
64000 recs will form a one input split of 64mb. – Ronak Patel Feb 13 '18 at 12:50
-
correct,! but based on the data given in the question, how many input splits will generate when we have each line/record is of 1 KB.? will it be 4 or 5 splits.? – TheCodeCache Feb 13 '18 at 13:22
-
if all 260MB are all 1 kb that is 260000kb data, 260000/64000=4.06 input splits, but knowing records does not get split between two input splits, seeing ~ 4 input splits in log is expected – Ronak Patel Feb 13 '18 at 13:47