1

I have a column which contends duplicate rows, then i will like to delete but to keep the first 2 instances .

Remove duplicate lines which has been repeated more than 2 times

Example input

i 10
i 10
a 12
a 12
b 12
b 12
c 14
c 14
x 14
x 14
y 14
y 14
a 14
a 14
n 13
n 13
m 13
m 13
x 13
x 13

output desired.

i 10
i 10
a 12
a 12
c 14
c 14
n 13
n 13

I tried

awk '!a[$2]++' file

Appreciate your help

Inian
  • 80,270
  • 14
  • 142
  • 161
OXXO
  • 724
  • 5
  • 12
  • The duplicates are in column 2 – OXXO Dec 01 '17 at 20:54
  • Possible duplicate of [awk - Remove line if field is duplicate](https://stackoverflow.com/questions/2604088/awk-remove-line-if-field-is-duplicate). You only alter it a little: `awk '{ if ( a[$2]++ <= 1 ) print; }' file` – PesaThe Dec 01 '17 at 21:01

1 Answers1

4

I think the problem with your command is that you are checking if it is the first one instead of checking whether it is the one of the first two. Something like this should work:

awk 'a[$2]++<2' file
EdmCoff
  • 3,506
  • 1
  • 9
  • 9