How to find duplicate lines in a file?

Question

I have an input file with foillowing data:

line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3

I am trying to find all the duplicate lines , I tried

sort filename | uniq -c

but does not seem to be working for me :

It gives me :

  1 begin
  1 end
  1 line1
  1 line1
  1 line2
  1 line3
  1 line3
  1 line5
  1 line6
  1 line7
  1 line9

the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated? but nature of input data is different .

Please suggest .

If I try to reproduce your problem, I get lines like `2 line3`, so probably there is a problem with spacing after `line1` etc in the source file. — Willem Van Onsem, Jan 09 '17 at 12:24
Thanks Will there was a spacing problem indeed , I removed the space and result is OK — Vicky, Jan 09 '17 at 12:29

score 8 · Accepted Answer · answered Jan 09 '17 at 12:22

8

use this:

sort filename | uniq -d
man uniq

answered Jan 09 '17 at 12:22

Angel Bochev

126
4

score 0 · Answer 2 · answered Jun 15 '22 at 13:29

0

try

sort -u file

or

awk '!a[$0]++' file

answered Jun 15 '22 at 13:29

wsdzbm

3,096
3
25
28

score 0 · Answer 3 · answered Jun 15 '22 at 14:57

you'll have to modify the standard de-dupe code just a tiny bit to account for this:

if you want unique copy of the duplicates, then it's very much same idea:

  {m,g}awk 'NF~ __[$_]++' FS='^$'
  {m,g}awk '__[$_]++==!_'

If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.

Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

How to find duplicate lines in a file?

3 Answers3