I know this question has been asked several times before. Here is one example:
Using AWK to merge two files based on multiple columns
My goal is to print out columns 2, 4, 5 and 7 of file_b and columns 17 and 18 of file_a if the following match occurs: Columns 2, 6 and 7 of file_a.csv matches with Columns 2, 4 and 5 of file_b.csv respectively.
But no matter how much I try, I can't get it to work for my case. Here are my two files:
file_a.csv
col2, col6, col7, col17, col18
a, b, c, 145, 88
e, f, g, 101, 96
x, y, z, 243, 222
file_b.csv
col2, col4, col5, col7
a, b, c, 4.5
e, f, g, 6.3
x, k, l, 12.9
Output should look like this:
col2, col4, col5, col7, col17, col18
a, b, c, 4.5, 145, 88
e, f, g, 6.3, 101, 96
I tried this:
awk -F, -v RS='\r\n' 'NR==FNR{key[$2 FS $6 FS $7]=$17 FS $18;next} {if($2 FS $4 FS $5 in key); print $2 FS $4 FS $5 FS $7 FS key[$2 FS $6 FS $7]}' file_a.csv file_b.csv > out.csv
Currently the output I am getting is:
col2, col4, col5, col7,
a, b, c, 4.5,
e, f, g, 6.3,
In other words, col17 and col18 from file_a is not showing up.
Yesterday I asked a related question where I was having issues with line breaks. That got answered and solved but now I think this problem is related to checking the if condition.
Update: I am sharing links to truncated copies of the actual data. The only difference between these files and the actual ones are that the real ones have millions of rows. These ones only have 10 each.