How can I find any repeated duplication in my file

Question

How can I find if my file has any repeated duplication. ?

Many of my vi files have large number of molecular co-ordinates, and sometimes, the software I use duplicates molecular co-ordinates on top of the first one, which goes unnoticed and only when I start using the molecule in simulations, that I get to know that this file had a repeated co-rodinates.

Using general grep, i need to test for every line , and see if a pattern is found.

Instead, is there a better approach ?

Ex:

C          8.72073       15.19207       10.44503

C          9.57223       14.02835       10.59743

C         10.54225       13.88199        9.86998

repeats in the file

those duplications stretch over the while line, or can be substrings of any line? — Deleted User, Jun 27 '14 at 19:10
those duplications stretch over the line. For Ex. C 1.23 3.45 4.56 H 4.56 3.45 4.56 might repeat — quarktosh, Jun 27 '14 at 19:11
Give us a few lines from the file, preferably ones with duplicates.. — Korem, Jun 27 '14 at 19:13
@mpapec desired output : tells me which line is repeated,along with its line number. — quarktosh, Jun 27 '14 at 19:25
@mpapec no, the last two lines are not duplicates, I am searhcing for exactly duplicated lines, and trying to delete them. — quarktosh, Jun 27 '14 at 20:08

score 0 · Answer 1 · answered Jun 27 '14 at 19:20

Use sort and uniq plus sed to clean the output:

Example:

echo -e 'a\nb\nc\na\nb'
a
b
c
a
b

echo -e 'a\nb\nc\na\nb' | sort | uniq -c
      2 a
      2 b
      1 c

echo -e 'a\nb\nc\na\nb' | sort | uniq -c | sed -re '/^\s+1\s+/d; s/^\s+[0-9]+\s+//g'
a
b

How can I find any repeated duplication in my file

1 Answers1