I have two csv (A with 300 000 lines, B with 1 000 000 lines). The records are related by a column ID (relation 1-1), but have not the same column order and are not order by ID. I need to get the 700 000 lines of B for which ID is absent in A.
A.csv
A.ID,A.Field01,A.Field02
2,a,d
4,b,e
1,c,f
B.csv
B.Field01,B.ID,B.Field02
g,2,f
f,4,r
h,6,k
a,1,3
(I want to select h,6,k)
I guess I could solve it with a sgbd like sqlite, but I'm sure it can be done with something more simple and more efficient like awk.
I wanted to adapt this command I found:
awk 'ARGIND==1 {x[$0]++; next} !x[$0]' B.csv A.csv
However this compare the entire lines and I don't know how to select specifically A.ID and B.ID as the fields to compare.
I'm also interested in any other efficient approach to solve this !
Cheers