-1

I have limited coding skills in Unix so I am asking for help.

I have 2 files coming in:

file1:

column names

Key Count

68150    500
68220    300
68448    200
68778    500

file2:

68150    53
68171     6
68448    18
68778    05
68993    13

What I need is: 1 - A single record for all of the unique keys between the 2 files. 2 - Where the key matches the Count fields are added together.

Output file:

68150    553
68171      6
68220    300
68448    215
68778    505
68993    13

I have looked a the 'join' 'comm' 'diff' and 'sort' commands and they don't appear to be what I need.

1 Answers1

0

In bash, this should work (assuming the input files are /tmp/file1 and /tmp/file2)

join -a 1 -a 2 <(sort -b /tmp/file1) <(sort -b /tmp/file2) | while read a b c; do
    echo $a $((${b:-0} + ${c:-0}))
done

Let me explain it step by step:

First, we have to join the lines (and by passing -a 1 -a 2 not dropping any unmatched records)

$ join -a 1 -a 2 <(sort -b /tmp/file1) <(sort -b /tmp/file2)
68150 500 53
68171 6
68220 300
68448 200 18
68778 500 05
68993 13

Then, we can just read everything line by line using read a b c, so the first column will be in $a, the second in $b and the third in $c. The last problem we have to solve is that it might only have two and not three columns, but that's easy in bash: ${c:-0} will evaluate to $c if non-empty and have a default value of 0 otherwise.

Oh, and using $(( expr )) we can do computations. Final output:

$ join -a 1 -a 2 <(sort -b /tmp/file1) <(sort -b /tmp/file2) | while read a b c; do echo $a $((${b:-0} + ${c:-0})); done
68150 553
68171 6
68220 300
68448 218
68778 505
68993 13
Johannes Weiss
  • 52,533
  • 16
  • 102
  • 136
  • I don't believe `join` will work if the files are sorted numerically. I'd drop the `-n` from the `sort` invocations. – Steve Summit Jun 29 '15 at 20:04
  • hi @SteveSummit, I believe it will work with both as the important thing is that lines are sorted similarly in both files. That's both true for `-n` and non `-n`. But `-n` is certainly not necessary so I dropped it :). – Johannes Weiss Jun 29 '15 at 20:09
  • Well, we had terrible problems with that over at [this question](http://stackoverflow.com/questions/30894105/join-statement-omitting-entries/). If the numbers are such that numeric verus alphabetic sorting gives different answers (think 10, 20, 101), join misses some of the joins, because it always assumes alphabetical sorting. – Steve Summit Jun 29 '15 at 20:19
  • yes, so the `join` man page does in fact mention we should use the `-b` option. So I'll change the answer accordingly. – Johannes Weiss Jun 29 '15 at 21:03