5

I have an input file with foillowing data:

line1
line2
line3
begin
line5
line6
line7
end
line9
line1
line3

I am trying to find all the duplicate lines , I tried

sort filename | uniq -c  

but does not seem to be working for me :

It gives me :

  1 begin
  1 end
  1 line1
  1 line1
  1 line2
  1 line3
  1 line3
  1 line5
  1 line6
  1 line7
  1 line9

the question may seem duplicate as Find duplicate lines in a file and count how many time each line was duplicated? but nature of input data is different .

Please suggest .

Community
  • 1
  • 1
Vicky
  • 1,298
  • 1
  • 16
  • 33
  • 1
    If I try to reproduce your problem, I get lines like `2 line3`, so probably there is a problem with spacing after `line1` etc in the source file. – Willem Van Onsem Jan 09 '17 at 12:24
  • 1
    Thanks Will there was a spacing problem indeed , I removed the space and result is OK – Vicky Jan 09 '17 at 12:29

3 Answers3

8

use this:

sort filename | uniq -d
man uniq
Angel Bochev
  • 126
  • 4
0

try

sort -u file

or

awk '!a[$0]++' file

wsdzbm
  • 3,096
  • 3
  • 25
  • 28
0

you'll have to modify the standard de-dupe code just a tiny bit to account for this:

if you want unique copy of the duplicates, then it's very much same idea:

  {m,g}awk 'NF~ __[$_]++' FS='^$'
  {m,g}awk '__[$_]++==!_'

If you want every copy printed for duplicates, then whenever the condition yields true for the first time, print 2 copies of it, plus print new matches along the way.

Usually it's waaaaaaaaay faster to first de-dupe, then sort, instead of the other way around.

RARE Kpop Manifesto
  • 2,453
  • 3
  • 11