The input file is a tab delimited unicode txt with
a A e f m
b B g h
c C i j
b B k l
I want to match by the first and second column and merge. So I want to get
a A e f m
b B g h k l
c C i j
The code has to detect the maximum number of columns in the input. Since it is 5 in this example, "k l" were put from 6th column.
Actually I almost managed to do this using Matlab when they are all numbers. But oh, when they were letters, Matlab was so bad at handling unicode, although I read stackoverflow about how to deal with unicode in Matlab I gave up. So I now turned to python.
With excel VBA, it seemed doable but since the data size is so huge so I guessed python will be faster than Excel VBA (Am I guessing correctly?)