Bash- is it possible to use -uniq for only one column of a line?

Question

    1.gui  Qxx  16
    2.gu   Qxy  23
    3.guT  QWS  18
    4.gui  Qxr  21

i want to sort a file depending a value in the 3rd column, so i use:

sort -rnk3 myfile

2.gu   Qxy  23
4.gui  Qxr  21
3.guT  QWS  18
1.gui  Qxx  16

now i have to output as: (the line starting with 3.gui is out because the line with 4.gui has a greater value)

2.gu   Qxy  23
4.gui  Qxr  21
1.guT  QWS  18

i can not use -head because i have millions of rows and i do not where to cut, i could not figure a way to use -uniq because it treats a line as whole and since i can not tell -uniq to look at first column, it counts a line which has unique it outputs it -which is normal-. i know -uniq can ignore a number of characters but as you can see from example first column might have various character count..

please advice..

possible duplicate of [Is there a way to 'uniq' by column?](http://stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column) — Ciro Santilli OurBigBook.com, Aug 14 '15 at 12:37

Guru · Accepted Answer · 2012-11-27T12:01:34.043

9

Try this:

sort -rnk3 myfile | awk -F"[. ]" '!a[$2]++'

awk removes the duplicates depending on the 2nd column. This is actually a famous awk syntax to remove duplicates. An array is maintained where the record of 2nd field is maintained. Every time before a record is printed, the 2nd field is checked in the array. If not present, it is printed, else its discarded since it is duplicate. This is achived using the ++. First time, when a record is encountered, this ++ will keep the count as 0 since its post-fix. SUbsequent occurences will increase the value which when negated becomes false.

edited Nov 27 '12 at 12:01

answered Nov 27 '12 at 11:43

Guru

16,456
2
33
46

2nd column because we are splitting the file with . and space as delimiter, and hence 2nd column will give us gui,etc.. – Guru Nov 27 '12 at 12:06

Chris Seymour · Answer 2 · 2012-11-27T11:57:06.763

2

Here you go:

sort -rnk3 file | awk -F'[. ]' '{ if (a[$2]++ == 0) print }' 

2.gu   Qxy  23
4.gui  Qxr  21
1.guT  QWS  18

This uses awk to check duplicate values in the second field where by the field separator is either a whitespace or a period. So this is what it treats the second field as:

$ awk -F'[. ]' '{ print $2 }' file

gu
gui
guT
gui

In awk the variable $0 represents the whole line, $1 represents the first field, and so on..

awk -F'[. ]' '{ if (a[$2]++ == 0) print }' the -F options let you specify the field separator, in this case it's either whitespace or a period.

edited Nov 27 '12 at 11:57

answered Nov 27 '12 at 11:51

Chris Seymour

83,387
30
160
202

hey @sudo_O ..thanks again. can you please explain the -awk command a litle? – teutara Nov 27 '12 at 11:54

score 0 · Answer 3 · answered Jun 21 '13 at 18:29

So I found this by the all powerful and amazing Google -- My little script builds off @sudo_O 's answer, in that it shows you all the duplicate lines found...., not a file without duplicates.

The text I was finding all duplicates in the 3rd column (port) were in a file called master.txt

awk '{if (a[$3]++ > 0) print}' master.txt | while read site thread port
do
  grep $port master.txt
done

Bash- is it possible to use -uniq for only one column of a line?

3 Answers3