This question is NOT a duplicate of: How does Hadoop process records split across block boundaries?
I've one question regarding the input split calculation. As per the hadoop guide
1) the InputSplits respect record boundaries
2) At the same time it say that splits are calculated by Job Submitter. Which I assume runs on the client side. [Anatomy of a MapReduce Job Run - Classic MRv1]
Does this mean that :
(a) job submitter reads blocks to calculate input splits? If this is the case then wont it be very inefficient and beat the very purpose of hadoop.
Or
(b) Does the job submitter just calculates splits that are merely an estimate based up on block sizes and location and Then does it become the InputFormat and RecordReader's responsibility running under mapper to get records across the host boundary.
Thanks