awk to input column data from one file to another based on a match

Question

Objective

I am trying to fill out $9(booking ref), $10 (client) in file1.csv with information pulled from $2 (booking ref) and $3 (client) of file2.csv using "CAMPAIGN ID" ($5 in file1.csv and $1 in file2.csv). So where I have a match between the two files based on "CAMPAIGN ID" I want to print the columns of file2.csv into the matching rows of file1.csv

File1.csv

INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client
BOB-UK,clientname1,platform_1,campaign1,20572431,5383594,0.05,2692.18,,
BOB-UK,clientname2,platform_1,campaign2,20589101,4932821,0.05,2463.641,,
BOB-UK,clientname1,platform_1,campaign3,23030494,4795549,0.05,2394.777,,
BOB-UK,clientname1,platform_1,campaign4,22973424,5844194,0.05,2925.21,,
BOB-UK,clientname1,platform_1,campaign5,21489000,4251031,0.05,2122.552,,
BOB-UK,clientname1,platform_1,campaign6,23150347,3123945,0.05,1561.197,,
BOB-UK,clientname3,platform_1,campaign7,23194965,2503875,0.05,1254.194,,
BOB-UK,clientname3,platform_1,campaign8,20578983,1522448,0.05,765.1224,,
BOB-UK,clientname3,platform_1,campaign9,22243554,920166,0.05,463.0083,,
BOB-UK,clientname1,platform_1,campaign10,20572149,118865,0.05,52.94325,,
BOB-UK,clientname2,platform_1,campaign11,23077785,28077,0.05,14.40385,,
BOB-UK,clientname2,platform_1,campaign12,21811100,5439,0.05,5.27195,,

File2.csv

CAMPAIGN ID,Booking Ref,client
20572431,ref1,1
21489000,ref2,1
23030494,ref3,1
22973424,ref4,1
23150347,ref5,1
20572149,ref6,1
20578983,ref7,2
22243554,ref8,2
20589101,ref9,3
23077785,ref10,3
21811100,ref11,3
23194965,ref12,3

Desired Output

INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client
BOB-UK,clientname1,platform_1,campaign1,20572431,5383594,0.05,2692.18,ref1,1
BOB-UK,clientname2,platform_1,campaign2,20589101,4932821,0.05,2463.641,ref9,3
BOB-UK,clientname1,platform_1,campaign3,23030494,4795549,0.05,2394.777,ref3,1
BOB-UK,clientname1,platform_1,campaign4,22973424,5844194,0.05,2925.21,ref4,1
BOB-UK,clientname1,platform_1,campaign5,21489000,4251031,0.05,2122.552,ref2,1
BOB-UK,clientname1,platform_1,campaign6,23150347,3123945,0.05,1561.197,ref5,1
BOB-UK,clientname3,platform_1,campaign7,23194965,2503875,0.05,1254.194,ref12,3
BOB-UK,clientname3,platform_1,campaign8,20578983,1522448,0.05,765.1224,ref7,2
BOB-UK,clientname3,platform_1,campaign9,22243554,920166,0.05,463.0083,ref8,2
BOB-UK,clientname1,platform_1,campaign10,20572149,118865,0.05,52.94325,ref6,1
BOB-UK,clientname2,platform_1,campaign11,23077785,28077,0.05,14.40385,ref10,3
BOB-UK,clientname2,platform_1,campaign12,21811100,5439,0.05,5.27195,ref11,3

What I've tried

From the research I've done on line this appears to be possible using awk and join (How to merge two files using AWK? to get me the closest out of what I found online).

I've tried various awk codes I've found online and I can't seem to get it to achieve my goal. below is the code I've been trying to massage and work that gets me the closes. At current the code is set up to try and populate just the booking ref as I presume I can just rinse repeat it for the client column. With this code I was able to get it to populate the booking ref but it required me to move CAMPAIGN ID to $1 and all it did was replace the values.

NOTE: The order for file1.csv won't sync with file2.csv. All rows may be in a different order as shown in this example.

current code

awk -F"," -v OFS=',' 'BEGIN { while (getline < "fil2.csv") { f[$1] = $2; } } {print $0, f[$1] }' file1.csv

Can someone confirm where I'm going wrong with this code as I've tried altering the columns in this - and the file - without success? Maybe it's just how I'm understanding the code itself.

RavinderSingh13 · Answer 1 · 2019-11-24T15:01:49.533

Could you please try following.

awk '
BEGIN{
  FS=OFS=","
  print "  print "INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client"
}
FNR==NR && FNR>1{
  val=$1
  $1=""
  sub(/^,/,"")
  a[val]=$0
  next
}
($5 in a) && FNR>1{
  sub(/,*$/,"")
  print $0,a[$5]
}
'  file2.csv  file1.csv

Explanation: Adding explanation for above code.

awk '                            ##Starting awk program from here.
BEGIN{                           ##Starting BEGIN section of code here.
  FS=OFS=","                     ##Setting FS(field separator) and OFS(output field separator)as comma here.
  print "INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client"
}                                ##Closing BEGIN section of this program now.
FNR==NR && FNR>1{                         ##Checking condition FNR==NR which will be true when file2.csv is being read.
  val=$1                         ##Creating variable val whose value is $1 here.
  $1=""                          ##Nullifying $1 here.
  sub(/^,/,"")                   ##Substitute initial comma with NULL in this line.
  a[val]=$0                      ##Creating an array a whose index is val and value is $0.
  next                           ##next will skip all further statements from here.
}                                ##Closing BLOCK for condition FNR==NR here.
($5 in a) && FNR>1{                       ##Checking if $5 is present in array a this condition will be checked when file1.csv is being read.
  sub(/,*$/,"")                  ##Substituting all commas at last of line with NULL here.
  print $0,a[$5]                 ##Printing current line and value of array a with index $5 here.
}                                ##Closing BLOCK for above ($5 in a) condition here.
'  file2.csv  file1.csv          ##Mentioning Input_file names here.

Output will be as follows.

INVOICE,CLIENT,PLATFORM,CAMPAIGN NAME,CAMPAIGN ID,IMPS,TFS,PRICE,Booking Ref,client
BOB-UK,clientname1,platform_1,campaign1,20572431,5383594,0.05,2692.18,ref1,1
BOB-UK,clientname2,platform_1,campaign2,20589101,4932821,0.05,2463.641,ref9,3
BOB-UK,clientname1,platform_1,campaign3,23030494,4795549,0.05,2394.777,ref3,1
BOB-UK,clientname1,platform_1,campaign4,22973424,5844194,0.05,2925.21,ref4,1
BOB-UK,clientname1,platform_1,campaign5,21489000,4251031,0.05,2122.552,ref2,1
BOB-UK,clientname1,platform_1,campaign6,23150347,3123945,0.05,1561.197,ref5,1
BOB-UK,clientname3,platform_1,campaign7,23194965,2503875,0.05,1254.194,ref12,3
BOB-UK,clientname3,platform_1,campaign8,20578983,1522448,0.05,765.1224,ref7,2
BOB-UK,clientname3,platform_1,campaign9,22243554,920166,0.05,463.0083,ref8,2
BOB-UK,clientname1,platform_1,campaign10,20572149,118865,0.05,52.94325,ref6,1
BOB-UK,clientname2,platform_1,campaign11,23077785,28077,0.05,14.40385,ref10,3
BOB-UK,clientname2,platform_1,campaign12,21811100,5439,0.05,5.27195,ref11,3

hek2mgl · Accepted Answer · 2019-11-24T13:05:01.783

Like this:

awk 'BEGIN{FS=OFS=","} NR==FNR{r[$1]=$2;c[$1]=$3;next} NR>1{$9=r[$5];$10=c[$5]} 1' \
    file2.csv file1.csv

Explanation in multi line form:

# Set input and output field delimiter to ,
BEGIN{
    FS=OFS=","
}

# Total row number is the same as the row number in file
# as long as we are reading the first file, file2.csv
NR==FNR{
    # Store booking ref and client id indexed by campaign id
    r[$1]=$2
    c[$1]=$3
    # Skip blocks below
    next
} 

# From here code runs only on file1.csv

NR>1{
    # Set booking ref and client id according to the campaign id
    # in field 5
    $9=r[$5]
    $10=c[$5]
}

# Print the modified line of file1.csv (includes the header line)
{
    print
}

awk to input column data from one file to another based on a match

Objective

What I've tried

current code

2 Answers2