5

I have a two CSV file which have a common column in both files along with duplicates in one file. How to merge both csv files using awk or sed?

CSV file 1

5/1/20,user,mark,Type1 445566
5/2/20,user,ally,Type1 445577
5/1/20,user,joe,Type1 445588
5/2/20,user,chris,Type1 445566

CSV file 2

Type1 445566,Name XYZ11
Type1 445577,Name AAA22
Type1 445588,Name BBB33
Type1 445566,Name XYZ11

What I want is?

5/1/20,user,mark,Type1 445566,Name XYZ11
5/2/20,user,ally,Type1 445577,Name AAA22
5/1/20,user,joe,Type1 445588,Name BBB33
5/2/20,user,chris,Type1 445566,Name XYZ11

So is there a bash command in Linux/Unix to achieve this? Can we do this using awk or sed?

Basically, I need to match column 4 of CSV file 1 with column 1 of CSV file 2 and merge both csv's.

Tried following command:

Command:

paste -d, <(cut -d, -f 1-2 ./test1.csv | sed 's/$/,Type1/') test2.csv

Got Result:

5/1/20,user,Type1,Type1 445566,Name XYZ11

Srujan reddy
  • 137
  • 1
  • 16

2 Answers2

7

If you are able to install the join utility, this command works:

join -t, -o 1.1 1.2 1.3 2.1 2.2 -1 4 -2 1 file1.csv file2.csv

Explanation:

-t, identify the field separator as comma (',')

-o 1.1 1.2 1.3 2.1 2.2 format the output to be "file1col1, file1col2, file1col3, file2col1, file2col2`

-1 4 join by column 4 in file1

-2 1 join by column 1 in file2

For additional usage information for join, reference the join manpage.

Edit: You specifically asked for the solution using awk or sed so here is the awk implementation:

awk -F"," 'NR==FNR {a[$1] = $2; next} {print $1","$2","$3","$4"," a[$4]}' \
    file2.csv \
    file1.csv

Explanation:

-F"," Delimit by the comma character

NR==FNR Read the first file argument (notice in the above solution that we're passing file2 first)

{a[$1] = $2; next} In the current file, save the contents of Column2 in an array that uses Column1 as the key

{print $1","$2","$3","$4"," a[$4]} Read file1 and using Column4, match the value to the key's value from the array. Print Column1, Column2, Column3, Column4, and the key's value.

0

The two example input files seem to be already appropriately sorted, so you just have to put them side by side, and paste is good for this; however you want to remove some ,-separated columns from file1, and you can use cut for that; but you also want to insert another (constant) column, and sed can do it. A possible command is this:

paste -d, <(cut -d, -f 1-2 file1 | sed 's/$/,abcd/') file2

Actually sed can do the whole processing of file1, and the output can be pided into paste, which uses - to capture it from the standard input:

sed -E 's/^(([^,]+,){2}).*/\1abcd/' file1 | paste -d, - file2
Enlico
  • 23,259
  • 6
  • 48
  • 102
  • Thank you @Enrico Maria De Angelis for a prompt response. By just pasting the file 2 columns won't work I tried this earlier. What I was trying is, extracting the column 2 of file 2 by matching it with column 4 of file 1 and colum 1 of file 2 – Srujan reddy May 13 '20 at 21:37
  • @Srujanreddy have you tried my command on the two file you have provided here? – Enlico May 13 '20 at 21:38
  • I have tried your command with a slight change and it gave me result like this Command used: paste -d, <(cut -d, -f 1-2 ./test1.csv | sed 's/$/,Type1/') test2.csv Result: 5/1/20,user,Type1,Type1 445566,Name XYZ11 – Srujan reddy May 13 '20 at 21:47
  • @Srujanreddy avoid putting code in the comments. Update the question instead, especially if you have to provide a more relevant input and output. – Enlico May 13 '20 at 21:51