Sort out the minimum value for each repeated name

Question

I am trying to sort out the minimum value out of a text file, which contains repeated names, but different values.

Peter 0.19827
Wilson 0.99234
Peter 0.08234
May -0.45623
Joe 0.88765
Wilson -0.88341
Joe 0.99943

I,ve tried this, but its not working: (I prefer awk one-liner)

cat aaa.txt | sort -k2nr | awk '{if ($2<min[$1]) {min[$1]=$2}}END{for (i in min) {print i,min[i]}}' | less

The expected output:

Peter 0.08234
Wilson -0.88341
May -0.45623
Joe 0.88765

By seeing your profile, come to know that sometimes you don't select any answer as correct one. Kindly select an answer out of all answers as correct one for all your questions. — RavinderSingh13, Jan 16 '19 at 15:54

score 1 · Answer 1 · answered Jan 16 '19 at 08:48

1

In case you are not worried about order of your first field from Input_file then try following.

awk '{{a[$1]=(a[$1]>$2?a[$1]?a[$1]:$2:$2)} END{for(i in a){print i,a[i]}}' Input_file

answered Jan 16 '19 at 08:48

RavinderSingh13

130,504
14
57
93

1

thanks @Ravinder, I prefer tripleee asnwer, as he has the extended answer to keep 1st field order. – Tatt Ehian Jan 16 '19 at 16:05

tripleee · Accepted Answer · 2019-01-16T08:55:04.410

Without the useless cat or the useless sort, and with the bug fixed,

awk '!($1 in min) || $2<min[$1] { min[$1] = $2 }
    END { for (i in min) print i,min[i] }' aaa.txt

The bug is that uninitialized array elements default to zero, so you were losing the ones which had a positive minimum.

I folded this for legibility; if you prefer, you can remove the embedded newline.

If preserving order is important, you can add a second array which keeps track of the order in which the keys appeared.

awk '!($1 in min) { k[++i] = $1; min[$1] = $2}
    $2<min[$1] { min[$1] = $2 }
END { for (j=1; j<=i; ++j) print k[j],min[k[j]] }' aaa.txt

score 0 · Answer 3 · answered Jan 16 '19 at 08:55

0

One more way if order is not an issue:

sort -k 1,1 -k 2n,2 file | awk '!_[$1]++'

answered Jan 16 '19 at 08:55

Guru

16,456
2
33
46

score 0 · Answer 4 · answered Jan 16 '19 at 09:44

Yet another awk:

$ awk '!($1 in a)||a[$1]>$2{a[$1]=$2}END{for(i in a)print i,a[i]}' file

Output:

May -0.45623
Peter 0.08234
Joe 0.88765
Wilson -0.88341

Explained:

$ awk '
!($1 in a) || a[$1]>$2 {  # if the key (name) has not yet been seen or its value is smaller
    a[$1]=$2              # store it to hash a
}
END {                     # after processing all the records
    for(i in a)           # go thru the stored keys
        print i,a[i]      # print them and their value
}' file

score 0 · Answer 5 · answered Jan 16 '19 at 10:02

You can try Perl also.

$ cat tatt.txt
Peter 0.19827
Wilson 0.99234
Peter 0.08234
May -0.45623
Joe 0.88765
Wilson -0.88341
Joe 0.99943
$ perl -lane ' @t=@{$kv{$F[0]}} ;push(@t,$F[1]);$kv{$F[0]}=[@t]; END { for(keys %kv) { @t=sort @{$kv{$_}}; print "$_,$t[0]" }} ' tatt.txt
Joe,0.88765
May,-0.45623
Wilson,-0.88341
Peter,0.08234
$

Sort out the minimum value for each repeated name

5 Answers5