Removing duplicate rows in vi?

Question

I have a text file that contains a long list of entries (one on each line). Some of these are duplicates, and I would like to know if it is possible (and if so, how) to remove any duplicates. I am interested in doing this from within vi/vim, if possible.

Looks like a duplicate of http://stackoverflow.com/questions/746689/unix-tool-to-remove-duplicate-lines-from-a-file — Nathan Fellman, Feb 18 '10 at 21:01
This one is 1 year old; that one is 10 months. So, other way around. — Sydius, Feb 26 '10 at 19:50
@Sydius consensus now is to prioritize upvote count (which you also have more of): http://meta.stackexchange.com/questions/147643/should-i-vote-to-close-a-duplicate-question-even-though-its-much-newer-and-ha And those are not duplicates, that one does not mention Vim :-) — Ciro Santilli OurBigBook.com, Aug 08 '16 at 08:38

score 360 · Accepted Answer · answered Dec 08 '08 at 22:32

360

If you're OK with sorting your file, you can use:

:sort u

answered Dec 08 '08 at 22:32

Brian Carper

71,150
28
166
168

24

If sorting is unacceptable, use ``:%!uniq`` to simply remove duplicate entries without sorting the file. – cryptic0 Mar 06 '18 at 14:39
once you use the command the whole file changes? how do you go back? I already saved the file by mistake ... my bad – nilon May 04 '18 at 14:51
Just use [Vim's undo command](http://vimdoc.sourceforge.net/htmldoc/undo.html): `u` – adampasz Aug 11 '19 at 00:23
@adampasz but I already closed the file and it does not seem to remember data from the last session. (This is a downside of always closing files while saving by just pressing `ZZ`.) – nilon Feb 25 '21 at 18:16
@cryptic0, uniq won't work unless the duplicates are sorted `a$b$a$` does nothing – CervEd May 02 '21 at 17:58
@nilon, that's when using persistent undo comes in handy https://stackoverflow.com/questions/5700389/using-vims-persistent-undo – egst Oct 14 '22 at 09:54
You can select the lines you want sorted and deduplicated first with `V` or something similar, then issue the command. – Lorenz Leitner May 11 '23 at 12:32

score 37 · Answer 2 · edited Apr 23 '19 at 14:09

37

Try this:

:%s/^\(.*\)\(\n\1\)\+$/\1/

It searches for any line immediately followed by one or more copies of itself, and replaces it with a single copy.

Make a copy of your file though before you try it. It's untested.

edited Apr 23 '19 at 14:09

Brad Koch

19,267
19
110
137

answered Dec 08 '08 at 22:27

Sean

5,244
6
28
27

1

@hop Thanks for testing it for me. I didn't have access to vim at the time. – Sean Dec 09 '08 at 20:50
2

this hightlights all the duplicate lines for me but doesn't delete, am I missing a step here? – ak85 Sep 14 '12 at 23:57
I'm pretty sure this will also highlight a line followed by a line that has the same "prefix" but is longer. – hippietrail Apr 29 '15 at 01:47
This is the better solution and doesn't change the line numbers as well. Thanks – Amir Oct 02 '17 at 05:44
3

The only issue with this is that if you have multiple duplicates (3 or more of the same lines), you have to run this many times until all dups are gone since this only removes them one set of dups at a time. – horta Jan 22 '18 at 23:56
`g/\v([^ ].*)$\n\1/d` avoiding blank lines would be great – SergioAraujo Mar 19 '18 at 18:26
2

Another drawback of this: this won't work unless your duplicate lines are already next to each other. Sorting first would be one way of ensuring they're next to each other. At that point, the other answers are probably better. – horta Mar 09 '19 at 22:42

score 29 · Answer 3 · edited Sep 23 '15 at 10:36

29

From command line just do:

sort file | uniq > file.new

edited Sep 23 '15 at 10:36

kenorb

155,785
88
678
743

answered Apr 11 '11 at 16:31

Kevin

1,489
2
20
30

2

This was very handy for me for a huge file. Thanks! – Rafid Jan 25 '14 at 15:59
1

Couldn't get the accepted answer to work, as `:sort u` was hanging on my large file. This worked very quickly and perfectly. Thank you! – TayTay Mar 23 '15 at 15:50
1

`'uniq' is not recognized as an internal or external command, operable program or batch file.` – hippietrail Apr 29 '15 at 01:49
2

Yes -- I tried this technique on a 2.3 GB file, and it was shockingly quick. – DanM Feb 06 '17 at 19:46
@hippietrail You are on windows PC? Maybe you can use cygwin. – 12431234123412341234123 May 08 '18 at 12:55

score 13 · Answer 4 · answered Aug 04 '16 at 12:38

13

awk '!x[$0]++' yourfile.txt if you want to preserve the order (i.e., sorting is not acceptable). In order to invoke it from vim, :! can be used.

answered Aug 04 '16 at 12:38

Rovin Bhandari

445
1
4
12

4

This is lovely! Not needing to sort is *exactly* what I was looking for! – Cometsong Oct 13 '17 at 16:36
what does it do? – CervEd May 02 '21 at 21:58
This can also be done in perl if it strikes your fancy `perl -nle 'print unless $seen{$_}++' yourfile.txt` – Billious Mar 30 '23 at 19:43

score 6 · Answer 5 · answered Nov 01 '09 at 18:23

6

g/^\(.*\)$\n\1/d

Works for me on Windows. Lines must be sorted first though.

answered Nov 01 '09 at 18:23

Bridgey

529
5
15

1

This will delete a line following a line which is it's prefix: `aaaa` followed by `aaaabb` will delete `aaaa` erroneously. – hippietrail Apr 29 '15 at 01:51

score 6 · Answer 6 · answered Dec 09 '08 at 01:16

6

I would combine two of the answers above:

go to head of file
sort the whole file
remove duplicate entries with uniq

1G
!Gsort
1G
!Guniq

If you were interested in seeing how many duplicate lines were removed, use control-G before and after to check on the number of lines present in your buffer.

answered Dec 09 '08 at 01:16

Jon DellOro

533
5
9

1

`'uniq' is not recognized as an internal or external command, operable program or batch file.` – hippietrail Apr 29 '15 at 01:48

score 4 · Answer 7 · answered Jan 13 '21 at 09:56

4

If you don't want to sort/uniq the entire file, you can select the lines you want to make uniq in visual mode and then simply: :sort u.

answered Jan 13 '21 at 09:56

John Poulis

41
3

If you know the line numbers you want sorted to unique you can prefix the starting and ending line numbers, eg. if you want to sort+unique lines 5 through 10 the command would be `:5,10 sort u` – Billious Mar 30 '23 at 19:39

score 3 · Answer 8 · edited Sep 23 '15 at 10:37

3

Select the lines in visual-line mode (Shift+v), then :!uniq. That'll only catch duplicates which come one after another.

edited Sep 23 '15 at 10:37

kenorb

155,785
88
678
743

answered Dec 08 '08 at 22:32

derobert

49,731
15
94
124

1

Just to note this will only work on computers with the uniq program installed i.e. Linux, Mac, Freebsd etc – anteatersa Feb 11 '14 at 11:26
This will be the best answer to those who don't need sorting. And if you are windows user, consider to try Cygwin or MSYS. – fx-kirin Jun 30 '16 at 04:44

score 1 · Answer 9 · answered Dec 09 '08 at 10:05

1

Regarding how Uniq can be implemented in VimL, search for Uniq in a plugin I'm maintaining. You'll see various ways to implement it that were given on Vim mailing-list.

Otherwise, :sort u is indeed the way to go.

answered Dec 09 '08 at 10:05

Luc Hermitte

31,979
7
69
83

score 0 · Answer 10 · answered Apr 30 '14 at 06:45

0

:%s/^\(.*\)\(\n\1\)\+$/\1/gec

or

:%s/^\(.*\)\(\n\1\)\+$/\1/ge

this is my answer for you ,it can remove multiple duplicate lines and only keep one not remove !

answered Apr 30 '14 at 06:45

cn8341

129
7

score 0 · Answer 11 · edited Sep 23 '15 at 10:37

0

I would use !}uniq, but that only works if there are no blank lines.

For every line in a file use: :1,$!uniq.

edited Sep 23 '15 at 10:37

kenorb

155,785
88
678
743

answered Dec 08 '08 at 22:34

Chris Dodd

2,920
15
10

score 0 · Answer 12 · answered Mar 19 '18 at 20:36

This version only removes repeated lines that are contigous. I mean, only deletes consecutive repeated lines. Using the given map the function does note mess up with blank lines. But if change the REGEX to match start of line ^ it will also remove duplicated blank lines.

" function to delete duplicate lines
function! DelDuplicatedLines()
    while getline(".") == getline(line(".") - 1)
        exec 'norm! ddk'
    endwhile
    while getline(".") == getline(line(".") + 1)
        exec 'norm! dd'
    endwhile
endfunction
nnoremap <Leader>d :g/./call DelDuplicatedLines()<CR>

score 0 · Answer 13 · answered Oct 16 '18 at 11:20

0

An alternative method that does not use vi/vim (for very large files), is from the Linux command line use sort and uniq:

sort {file-name} | uniq -u

answered Oct 16 '18 at 11:20

william-1066

459
6
10

score -1 · Answer 14 · answered Oct 17 '18 at 10:02

This worked for me for both .csv and .txt

awk '!seen[$0]++' <filename> > <newFileName>

Explanation: The first part of the command prints unique rows and the second part i.e. after the middle arrow is to save the output of the first part.

awk '!seen[$0]++' <filename>

>

<newFileName>

Removing duplicate rows in vi?

14 Answers14

Linked