-1

Please consider the following example:

Two columns data:

ti piace o no la apple p181026 07348
ti piace o no la apple p181026 07349
ti piace o no la apple p181026 07345

where the p[0-9]\s[0-9] sequence is tab separated from the first column.

I would like to remove duplicates according only to the first column (alphabetic part of the line). I tried with:

sort  -u -t$'\t' -k1 -nr inputfile > out

and with

sort -t$'\t' -k1 -nr inputfile | uniq > out 

with no success. I am afraid I am missing something abvious, but even by consulting other relevant questions on the matter I am still not able to figure it out.

Thanks in advance for sharing your experience with me.

Worice
  • 3,847
  • 3
  • 28
  • 49
  • Possible duplicate of [Is there a way to 'uniq' by column?](https://stackoverflow.com/questions/1915636/is-there-a-way-to-uniq-by-column) – Corentin Limier Jul 02 '19 at 16:13

2 Answers2

1

With GNU sort and bash:

sort -t $'\t' -k 1,1 -u file

Output:

ti piace o no la apple  p181026 07348
Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

Since your delimiter is NOT clear from samples, so I am going with p[0-9]\s[0-9] regex mentioned by you, could you please try following.

awk 'match($0,/p[0-9]+ +[0-9]+]*/){a=substr($0,1,RSTART-1)} !array[a]++' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93