I am relatively new to the hadoop world. I have been following examples I could find to understand how the record splitting step works for mapreduce jobs. I noticed that TextInputFormat splits file into records with key as the byte offset and value as a string. In this case, we could have two different records in a mapper having same offset from different input files.
Does it affect the mapper in any way? I think the uniqueness of the key to mapper is irrelevant if we do not process it (e.g. wordcount). But if we have to process it in mapper, the key may have to be unique. Can anyone elaborate on this ?
Thanks in advance.