awk - if values match then print file1 and file 2

Question

I googled a lot my problem and tested different solutions, but none seem to work. I even used the same command in advance with success but now I do not manage to get my desired output.

I have file1

AAA;123456789A
BBB;123456789B
CCC;123456789C

And file2

1;2;3;CCC;pippo
1;2;3;AAA;pippo
1;2;3;BBB;pippo
1;2;3;*;pippo

My desired output is this:

1;2;3;CCC;pippo;CCC;123456789C
1;2;3;AAA;pippo;AAA;123456789A
1;2;3;BBB;pippo;BBB;123456789B

I tried with this command:

awk -F";" -v OFS=";" 'FNR == NR {a[$10]=$1; b[$20]=$2; next}($10 in a){ if(match(a[$10],$4)) print $0,a[$10],b[$20]}' file1 file2

But I get this output (only one entry, even with bigger files):

1;2;3;CCC;pippo;CCC;123456789C

What am I doing wrong? If it manages for one it should for all the other. Why is this not happening? Also why if I set a[$1]=$1 it doesn't work?
Thank you for helping! If possible could you explain the answer? So next time I won't have to ask for help!

EDIT: Sorry, I did not mention (since I wanted to keep the example minimal) that in file2 some fields are just "*". And I'd like to add an "else doesn't match do something".

So you want to match 1st field in file1 with 4th field in file2? Also, why are you fetching fields `$10` and `$20` if there are just 5 or 6? — fedorqui, Dec 02 '15 at 13:53
@CasimiretHippolyte it is a typo, `arr` should be `a` . @fedorqui I just used to values bigger than 4. But why I cannot use $1 and $2 of my array `a` ? Isn't it empty? — Stefano_g, Dec 02 '15 at 14:38
@EdMorton you got where I do the mistake then. How can I pass the whole col1 from file1 and compare it with col4 of file 2 then? I adapted my code from here http://stackoverflow.com/questions/31168521/awk-search-column-from-one-file-if-match-print-columns-from-both-files . — Stefano_g, Dec 02 '15 at 14:52
Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins. You need to get a basic understanding of awk as a foundation rather than just copy/pasting scripts and trying random changes hoping it'll work. — Ed Morton, Dec 02 '15 at 15:20

karakfa · Accepted Answer · 2015-12-02T15:30:34.037

1

awk to the rescue!

$ awk 'BEGIN{FS=OFS=";"} 
     NR==FNR{a[$1]=$0;next} 
            {print $0,a[$4]}' file1 file2

1;2;3;CCC;pippo;CCC;123456789C
1;2;3;AAA;pippo;AAA;123456789A
1;2;3;BBB;pippo;BBB;123456789B

UPDATE: Based on the original input file it was only looking for exact match. If you want to skip the entries where there is no match, you need to qualify the print block with $4 in a

$ awk 'BEGIN{FS=OFS=";"} 
     NR==FNR{a[$1]=$0;next} 
     $4 in a{print $0,a[$4]}' file1 file2

edited Dec 02 '15 at 15:30

answered Dec 02 '15 at 14:41

karakfa

66,216
7
41
56

Seems to work perfectly! Thanks a lot! could you gently tell me why you do not need to use `if(match())` ? – Stefano_g Dec 02 '15 at 15:02
input file changed after my answer. – karakfa Dec 02 '15 at 15:27

score 0 · Answer 2 · answered Dec 02 '15 at 13:59

join is made for this sort of thing:

$ join -t';' -1 4 -o1.{1..5} -o2.{1..2} <(sort -t';' -k4 file2) <(sort -t';' file1)

1;2;3;AAA;pippo;AAA;123456789A
1;2;3;BBB;pippo;BBB;123456789B
1;2;3;CCC;pippo;CCC;123456789C

The output is what you asked for except for the ordering of lines, which I assume isn't important. The -o options to join are needed because you want the full set of fields; you can try omitting it and you'll get the join field on the left a single time instead, which might also be fine.

Nice! Unfortunately works perfectly on the example but not on my real file as I mention in the EDIT. Sorry I forgot to explain a piece. — Stefano_g, Dec 02 '15 at 14:57

awk - if values match then print file1 and file 2

2 Answers2