Joining two files with multiple columns via AWK

Question

First of all, I must apologise : I know there's a lot of various topics that already answer my question, but as you'll see by yourself, AWK isn't really a big friend of mine.

You all know the story, right ? ;) "Hey random employee, you are the chosen one ! I need you to learn this strange thing that none of us know. Your deadline is tomorrow, good luck !"

I won't complain about it anymore (promise ! :p), but after many tries, I can't really understand everything (who said "a single thing" ?) about AWK.

So, here are my questions !

I have two files, with the following columns :

File A.txt :

A B C D E F G H

File B.txt :

A C F I

I want to get the following output by joining these two files in another one :

Ouput file C.txt :

A B C D E F G H I

I would like to make a join between them, adding "I" to already existent lines with columns A, C and F, and removing the other ones.

So far, I know that I must use something like this :

awk '
    FNR==NR{Something ;next}
    {print $0}
' A.txt B.txt

Yeah, I know. Sounds pretty bad for a start.

Any hero, over there ?

Will we always be considering the 1st, 3rd and 6th columns from A.txt? Or just lines from B.txt that have 3 values anywhere in some line from A.txt? What if B.txt contains `A B C J`? — glenn jackman, Jan 22 '14 at 16:53
Thank for your fast reply ! Edit : Sorry, didn't see that you edited your comment. At last, we will need every line to match columns 1, 3 and 6 from file B columns 1, 2 and 3. To explain myself, B.txt only has 4 columns, while A.txt has 8 columns. — Jordan.lamarche, Jan 22 '14 at 16:58
Please see http://stackoverflow.com/questions/5467690/how-to-merge-two-files-using-awk . Seems to be pretty similar question. — Sunny Nanda, Jan 22 '14 at 17:02

score 4 · Accepted Answer · answered Jan 22 '14 at 17:03

4

awk '
    NR==FNR {A[$1,$3,$6] = $0; next} 
    ($1 SUBSEP $2 SUBSEP $3) in A {print A[$1,$2,$3], $4}
' A.txt B.txt

That requires the whole file A.txt to be stored in memory. If B.txt is significantly smaller

awk '
    NR==FNR {B[$1,$2,$3] = $4; next}
    ($1 SUBSEP $3 SUBSEP $6) in B {print $0, B[$1,$3,$6]}
' B.txt A.txt

answered Jan 22 '14 at 17:03

glenn jackman

238,783
38
220
352

@Jordan.lamarche: just so you know: you'll have "NR==FNR" only at the first pass of the first file (FNR=Number of records (or lines) while reading the current File or standard input. NR= total number of records (or lines) read so far, amongst all files/input). So it allows glenn to separate the reading of the first file (and create the A[] array then, and "next" to not do the following lines), and on the 2nd file, it bypass the first line (as NR can only be > FNR) so it does the rest. – Olivier Dulac Jan 22 '14 at 17:27

Joining two files with multiple columns via AWK

1 Answers1

Linked

Related