Find duplicate rows escaping first column

Question

I have a tsv file with data from some event participants. Here is a small snippet from it:

...
sub-09          37   F    19780726   20160328    20160329
sub-10          38   F    19780208   20160406    20160407
sub-11          39   M    19770511   20160704    20160705
...
sub-42          37   F    19780726   20160328    20160329
...

Note that sub-09 and sub-42 are duplicates.

In BASH, how can I find duplicate lines but ignoring the first (or in general any other) column? I've seen similar threads, e.g., this one, but I couldn't find an answer that fits. Ideally I would get both occurrences of all duplicates, as in:

Expected output:

sub-09          37   F    19780726   20160328    20160329
sub-42          37   F    19780726   20160328    20160329

What's your expected output? – anubhava Oct 06 '16 at 09:34 — anubhava, Oct 06 '16 at 09:34
@anubhava Thx. Just added to the question. – Daniel Oct 06 '16 at 09:37 — Daniel, Oct 06 '16 at 09:37

score 2 · Accepted Answer · answered Oct 06 '16 at 09:37

2

Use uniq -d to show duplicates. Use its -f option to skip fields. As uniq needs the input sorted, first sort ignoring the first column:

sort -nk2 file | uniq -f1 -d

Use -D instead of -d if you want all the duplicates.

answered Oct 06 '16 at 09:37

choroba

231,213
25
204
289

Works perfectly. Thanks – Daniel Oct 06 '16 at 09:42

anubhava · Answer 2 · 2016-10-06T09:56:52.487

1

Here is an awk based solution that avoids sorting the file (which can be pretty expensive for a large file):

awk '{
   p = $1
   $1 = ""
   freq[$0]++
   col1[$0,freq[$0]] = p
} 
END {
   for (i in freq)
      for (j=1; freq[i]>1 && j<=freq[i]; j++)
         print col1[i,j] i
 }' file

sub-09 37 F 19780726 20160328 20160329
sub-42 37 F 19780726 20160328 20160329

edited Oct 06 '16 at 09:56

answered Oct 06 '16 at 09:51

anubhava

761,203
64
569
643

score 0 · Answer 3 · answered Oct 06 '16 at 12:29

0

awk 'FNR==NR{$1="";a[$0]++;next}{s=$0;$1="";if(a[$0]>=2) print s}' file file

answered Oct 06 '16 at 12:29

zxy

148
1
2

Please add an explication as to how your code solves the issue. – Hunter Turner Oct 06 '16 at 16:02

Find duplicate rows escaping first column

3 Answers3