As i understood that File splitting at the time of copying the file into HDFS and input splits on file for mapper input are entirely two different approaches.
Here it is my Question--
Suppose if my File1 size is 128MB which was split ted into two blocks and stored in two different data nodes (Node1,Node2) in hadoop cluster. I want to run MR job on this file and got two input splits of the sizes are 70MB and 58 MB respectively. First mapper will run on node1 by taking the input split data (Of size 70 MB) but Node1 has 64MB data only and remaining 6 MB data presented in Node2.
To complete Map task on Node1, Does hadoop transfer 6MB of data from Node2 to Node1? if yes, what if Node1 do not have enough storage to store 6MB data from Node2.
My apologies if my concern is awkward.