-1

I have two files.

file1 has some keys that start have abc in the second column

et1 abc
et2 abc
et55 abc  

file2 has the column 1 values and some other numbers I need to add up:

1 2 3 4 5 et1
5 5 5 5 5 et100
3 3 3 3 3 et55
5 5 5 5 4 et1
6 6 6 6 3 et1

For the keys extracted in file1, I need to add up the corresponding column 5 if it matches. File2 itself is very large

This command seems to be working but it is very slow:

 egrep -isr "abc" file1.tcl | awk '{print $1}' | grep -vwf /dev/stdin file2.tcl | awk '{tl+=$5} END {print tl}'

How would I go about optimizing the pipe. Also what am I doing wrong with grep -f. Is it generally not recommended to do something like this.

Edit: Expected output is the sum of all column5 in file2 when the column6 key is present in file1

Edit2:Expected output: Since file 1 has keys "et1, et2 and et55", in file2 adding up the column 5 with matching keys in rows 1,3,4 and 5, the expected output is [5+3+4+3=15]

identical123456
  • 325
  • 3
  • 12

2 Answers2

1

Use a single awk to read file1 into the keys of an array. Then when reading file2, add $5 to a total variable when $6 is in the array.

awk 'NR==FNR {if ($2 == "abc") a[$1] = 0; 
              next}
     $6 in a {total += $5}
     END { print total }
    ' file1.tcl file2.tcl
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    @EdMorton Not needed since I initialized them all to 0. – Barmar Dec 18 '18 at 00:48
  • 1
    But it turns out he doesn't want per-key totals, just a grand total. – Barmar Dec 18 '18 at 00:50
  • Thanks. Possibly final question: How would I go about optimizing the grep/awk pipes in the original post while making minimal changes to the pipes. Is grep -f inherently slow ? – identical123456 Dec 18 '18 at 01:10
  • No, not particulary. But two processes are usually slower than one, unless `grep` is significantly faster than `awk`'s built-in matching. And `grep` has to search the entire line, `awk` can just match the specific field. – Barmar Dec 18 '18 at 01:12
1

Could you please try following, with reading first Input_file2.tcl and with less loops. Since your expected output is not clear so haven't completely tested it.

awk 'FNR==NR{a[$NF]+=$(NF-1);next} $2=="abc"{print $1,a[$1]+0}'  file2.tcl file1.tcl
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93