Sort duplicates by column

Question

Please consider the following example:

Two columns data:

ti piace o no la apple p181026 07348
ti piace o no la apple p181026 07349
ti piace o no la apple p181026 07345

where the p[0-9]\s[0-9] sequence is tab separated from the first column.

I would like to remove duplicates according only to the first column (alphabetic part of the line). I tried with:

sort  -u -t$'\t' -k1 -nr inputfile > out

and with

sort -t$'\t' -k1 -nr inputfile | uniq > out

with no success. I am afraid I am missing something abvious, but even by consulting other relevant questions on the matter I am still not able to figure it out.

Thanks in advance for sharing your experience with me.

Possible duplicate of [Is there a way to 'uniq' by column?](https://stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column) — Corentin Limier, Jul 02 '19 at 16:13

score 1 · Accepted Answer · answered Jul 02 '19 at 16:21

1

With GNU sort and bash:

sort -t $'\t' -k 1,1 -u file

Output:

ti piace o no la apple  p181026 07348

answered Jul 02 '19 at 16:21

Cyrus

84,225
14
89
153

It appears to work perfectly, thanks. Now I have clear how I messed up with the flags. – Worice Jul 02 '19 at 16:25

score 0 · Answer 2 · answered Jul 02 '19 at 16:21

0

Since your delimiter is NOT clear from samples, so I am going with p[0-9]\s[0-9] regex mentioned by you, could you please try following.

awk 'match($0,/p[0-9]+ +[0-9]+]*/){a=substr($0,1,RSTART-1)} !array[a]++' Input_file

answered Jul 02 '19 at 16:21

RavinderSingh13

130,504
14
57
93

The file is tab separated. – Worice Jul 02 '19 at 16:23
1

Thanks for your interesting approach! – Worice Jul 02 '19 at 16:26
1

@Worice, I was going for simple approach :) but seems while pasting your samples tab is not there somehow so I thought to go by your 2nd hint of using regex then, cheers :) – RavinderSingh13 Jul 02 '19 at 16:27

Sort duplicates by column

2 Answers2