2

This is related to the questions

I have a file like this:

FOO,BAR,100,200,300
BAZ,TAZ,500,600,800
FOO,BAR,900,1000,1000
HERE,THERE,1000,200,100
FOO,BAR,100,10000,200
BAZ,TAZ,100,40,500

The duplicates are determined by the first two fields. In addition, the more "recent" record (lower in the file / higher line number) is the one that should be retained.

What is an awk script that will output:

BAZ,TAZ,100,40,500
FOO,BAR,100,10000,200
HERE,THERE,1000,200,100

Output order is not so important.

Explanation of awk syntax would be great.

Community
  • 1
  • 1
noahlz
  • 10,202
  • 7
  • 56
  • 75

2 Answers2

2

This is easy in : we just need to feed an array with a key combined with the 1st and the 2nd columns and the rest as values :

$ awk -F, '{a[$1","$2]=$3","$4","$5}END{for(i in a)print i,a[i]}' OFS=, file.txt
BAZ,TAZ,100,40,500
HERE,THERE,1000,200,100
FOO,BAR,100,10000,200
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • Can you please elaborate or link to an explanation of arrays in awk and how they help solve this problem? – noahlz Apr 05 '13 at 22:41
  • 1
    An associative array have unique keys by nature. That's all the magic. Since we keep only the latest values, we just need to iterate over the lines and @the end, display the array line by lines – Gilles Quénot Apr 05 '13 at 22:53
1

This might work for you (tac and GNU sort):

tac file | sort -sut, -k1,2
potong
  • 55,640
  • 6
  • 51
  • 83
  • Ok, you demonstrated a solution that doesn't even use awk, and now I know about `tac` and some `sort` options that weren't obvious. Winner. – noahlz Apr 08 '13 at 02:29