I have a text file of 100 TB and it has multiline records. And we are not given that each records takes how many lines. One records can be of size 5 lines, other may be of 6 lines another may be 4 lines. Its not sure the line size may vary for each record.
So I cannot use default TextInputFormat, I have written my own inputformat and a custom record reader but my confusion is : When splits are happening, I am not sure if each split will contain the full record. Some part of record can go in split 1 and another in split 2. But this is wrong.
So, can you suggest how to handle this scenario so that I guarantee that my full record goes in a single InputSplit ?
Thanks in advance -JE