compare two fields from two different files using awk

Question

I have two files where I want to compare certain fields and produce the output

I have a variable as well

echo ${CURR_SNAP}
123

File1

DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|RSCNAME1
DOMAIN2|USER2|LE2|ORG2|ACCES2|RSCTYPE2|RSCNAME2
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|RSCNAME3
DOMAIN4|USER4|LE4|ORG4|ACCES4|RSCTYPE4|RSCNAME4

File2

ORG1|PRGPATH1
ORG3|PRGPATH3
ORG5|PRGPATH5
ORG6|PRGPATH6
ORG7|PRGPATH7

The output I am expecting as below where the last column is CURR_SNAP value and the matching will be 4th column of File1 should be matched with 1st column of File2

DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123

I tried with the below code piece but looks like I am not doing it correctly

awk -v CURRSNAP="${CURR_SNAP}" '{FS="|"} NR==FNR {x[$0];next} {if(x[$1]==$4) print $1"|"$2"|"$3"|"$4"|"$5"|"$6"|"CURRSNAP}' File2 File1

This should give you some idea on how to get environment variable inn to `awk` https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script — Jotne, Aug 23 '19 at 22:04

score 0 · Answer 1 · answered Aug 24 '19 at 01:43

0

I wouldn't use awk at all. This is what join(1) is meant for (Plus sed to append the extra column:

$ join -14 -21 -t'|' -o 1.1,1.2,1.3,1.4,1.5,1.6 File1 File2 | sed "s/$/|${CURR_SNAP}/"
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123

It does require that the files be sorted based on the common field, like your examples are.

answered Aug 24 '19 at 01:43

Shawn

47,241
3
26
60

the actual size of both of the files are in GBs', and when I tried the command it throws me error "join: file1 is not in sorted order" – Koushik Chandra Aug 24 '19 at 02:14
@KoushikChandra Like I said, join does require that files be sorted. If yours aren't, consider editing your samples to reflect that. If using bash, zsh, etc. one easy way to sort them is `join other args <(sort -t'|' -k4,4 File1) <(sort -t'|' -k1,1 File2)`. Or sort them ahead of time if this is something that's going to be run frequently. – Shawn Aug 24 '19 at 02:34

score 0 · Answer 2 · answered Aug 24 '19 at 03:22

You can do this with awk with two-rules. For the first file (where NR==FNR), simply use string concatenation to append the fields 1 - (NF-1) assigning the concatenated result to an array indexed by $4. Then for the second file (where NR>FNR) in rule two test if array[$1] has content and if so, output the array and append "|"CURR_SNAP (with CURR_SNAP shortened to c in the example below and array being a), e.g.

CURR_SNAP=123

awk -F'|' -v c="$CURR_SNAP" '
    NR==FNR {
        for (i=1;i<NF;i++)
            a[$4]=i>1?a[$4]"|"$i:a[$4]$1
    }
    NR>FNR {
        if(a[$1])
            print a[$1]"|"c
    }
' file1 file2

Example Use/Output

After setting the filenames to match yours, you can simply copy/middle-mouse-paste in your console to test, e.g.

$ awk -F'|' -v c="$CURR_SNAP" '
>     NR==FNR {
>         for (i=1;i<NF;i++)
>             a[$4]=i>1?a[$4]"|"$i:a[$4]$1
>     }
>     NR>FNR {
>         if(a[$1])
>             print a[$1]"|"c
>     }
> ' file1 file2
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123

Look things over and let me know if you have further questions.

score 0 · Accepted Answer · answered Aug 24 '19 at 03:23

With awk:

#! /bin/bash

CURR_SNAP="123"

awk -F'|' -v OFS='|' -v curr_snap="$CURR_SNAP" '{
    if (FNR == NR)
    {
        # this stores the ORG* as an index
        # here you can store other values if needed
        orgs_arr[$1]=1 
    }
    else if (orgs_arr[$4] == 1)
    {
        # overwrite $7 to contain CURR_SNAP value
        $7=curr_snap
        print
    }
}' file2 file1

As in your expected output, you didn't output RSCNAME*, so I have overwritten $7(which is column for RSCNAME*) with $CURR_SNAP. If you want to display RSCNAME* column aswell, remove $7=curr_snap and change print statement to print $0, curr_snap.

compare two fields from two different files using awk

3 Answers3