Is there a good way to perform 'full outer join' multiple files on Unix ideally using GNU coreutils.
My files were produced by uniq -c
, below is a mock example the can be generated by:
echo "12 aa\n3 bb" > file1
echo "5 aa\n6 bb\n1 cc" > file2
echo "11 aa\n7 bb\n4 dd" > file3
and looks like:
tail -n +1 file*
==> file1 <==
12 aa
3 bb
==> file2 <==
5 aa
6 bb
1 cc
==> file3 <==
11 aa
7 bb
4 dd
I want to merge them one by one, using the sequence (column 2) as key filling in a 0 if the key is not included (outer join). I.e. the desired output would look like this
12 5 11 aa
3 6 6 bb
0 1 0 cc
0 0 4 dd
So far I found join
to be doing the job at least for pairwise merging:
join -j2 file1 file2 -a 2 -a 2 -e '0' -o '1.1,2.1,0' > merged
# 12 5 aa
# 3 6 bb
# 0 1 cc
Note: j2 : looking at second column for key (for both files) -a FILENUM: also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2
But I don't know how to generalize it for multiple files, i.e. this one doesn't work, which means I can't easily put it in a loop:
join -j2 merged file3 -a 2 -a 2 -e '0' -o '1.1,2.1,0' > merged2
I'd be ideally not want to use SQL to achive this, but would be ok if there wasn't any other way.