1

A file is stored in HDFS of size 260 MB whereas the HDFS default block size is 64 MB. Upon performing a map-reduce job against this file, I found the number of input splits it creates is only 4. how did it calculated.? where is the rest 4 MB.? Any input is much appreciated.

TheCodeCache
  • 820
  • 1
  • 7
  • 27

1 Answers1

1

Input split is NOT always a block size. Input split is a logical representation of data. Your input split could have been 63mb, 67mb, 65mb, 65mb(or possibly other sizes based on logical records' sizes) ... see examples in below links...

Hadoop input split size vs block size

Another example - see section 3.3...

Ronak Patel
  • 3,819
  • 1
  • 16
  • 29
  • Suppose the logical records size in just few KB. let's say each line/record in the file is of 1 KB, then how many input splits it will generate.? – TheCodeCache Feb 13 '18 at 08:26
  • 64000 recs will form a one input split of 64mb. – Ronak Patel Feb 13 '18 at 12:50
  • correct,! but based on the data given in the question, how many input splits will generate when we have each line/record is of 1 KB.? will it be 4 or 5 splits.? – TheCodeCache Feb 13 '18 at 13:22
  • if all 260MB are all 1 kb that is 260000kb data, 260000/64000=4.06 input splits, but knowing records does not get split between two input splits, seeing ~ 4 input splits in log is expected – Ronak Patel Feb 13 '18 at 13:47