Related question: https://stackoverflow.com/posts/18164848
The input file input.txt is a tab delimited unicode txt with
a A e f m
b B g h
c C i j
b B k l
I want to match by the first and second column and merge. So I want to get output.txt with
a A e f m
b B g h k l
c C i j
The code has to detect the maximum number of columns in the input. Since it is 5 in this example, "k l" were put from 6th column.
Actually I almost managed to do this using Matlab when they are all numbers. But oh, when they were letters, Matlab was so bad at handling unicode, although I read stackoverflow about how to deal with unicode in Matlab I gave up. So I now turned to python.
Nirk at https://stackoverflow.com/posts/18164848 responded that the following line will do.
awk -F\t '{a=$1 "\t" $2; $1=$2=""; x[a] = x[a] $0} END {for(y in x) print y,x[y]}'
However this code doesn't seem to specify input and output file.