79

How would you go about marking all of the lines in a buffer that are exact duplicates of other lines? By marking them, I mean highlighting them or adding a character or something. I want to retain the order of the lines in the buffer.

Before:

foo
bar
foo
baz

After:

foo*
bar
foo*
baz
Krisztián Balla
  • 19,223
  • 13
  • 68
  • 84
Brian Carper
  • 71,150
  • 28
  • 166
  • 168

6 Answers6

116

As an ex one-liner:

:syn clear Repeat | g/^\(.*\)\n\ze\%(.*\n\)*\1$/exe 'syn match Repeat "^' . escape(getline('.'), '".\^$*[]') . '$"' | nohlsearch

This uses the Repeat group to highlight the repeated lines.

Breaking it down:

  • syn clear Repeat :: remove any previously found repeats
  • g/^\(.*\)\n\ze\%(.*\n\)*\1$/ :: for any line that is repeated later in the file
    • the regex
      • ^\(.*\)\n :: a full line
      • \ze :: end of match - verify the rest of the pattern, but don't consume the matched text (positive lookahead)
      • \%(.*\n\)* :: any number of full lines
      • \1$ :: a full line repeat of the matched full line
    • exe 'syn match Repeat "^' . escape(getline('.'), '".\^$*[]') . '$"' :: add full lines that match this to the Repeat syntax group
      • exe :: execute the given string as an ex command
      • getline('.') :: the contents of the current line matched by g//
      • escape(..., '".\^$*[]') :: escape the given characters with backslashes to make a legit regex
      • syn match Repeat "^...$" :: add the given string to the Repeat syntax group
  • nohlsearch :: remove highlighting from the search done for g//

Justin's non-regex method is probably faster:

function! HighlightRepeats() range
  let lineCounts = {}
  let lineNum = a:firstline
  while lineNum <= a:lastline
    let lineText = getline(lineNum)
    if lineText != ""
      let lineCounts[lineText] = (has_key(lineCounts, lineText) ? lineCounts[lineText] : 0) + 1
    endif
    let lineNum = lineNum + 1
  endwhile
  exe 'syn clear Repeat'
  for lineText in keys(lineCounts)
    if lineCounts[lineText] >= 2
      exe 'syn match Repeat "^' . escape(lineText, '".\^$*[]') . '$"'
    endif
  endfor
endfunction

command! -range=% HighlightRepeats <line1>,<line2>call HighlightRepeats()
g3cko
  • 920
  • 1
  • 6
  • 21
rampion
  • 87,131
  • 49
  • 199
  • 315
78

None of the answers above worked for me so this is what I do:

  1. Sort the file using :sort
  2. Execute command :g/^\(.*\)$\n\1$/p
Krisztián Balla
  • 19,223
  • 13
  • 68
  • 84
  • 1
    Thank you. I feel this is better approach. With this we can find duplicates lines as well customize up to required length – harsha Jun 11 '16 at 04:52
  • 2
    I love how simple this is! – Matt Wanchap Feb 12 '20 at 01:26
  • I'm going to need a hand with the explanation. `g` global command (run through each line from top to bottom in this case) `^\(.*\)$` capture the entire line ... `\n` and the `newline` character ... `\1$` and the previous line (to the end) This part just checks if the next line is the same as the current line. But why does my Vim highlight multiple separate occurrences, and what is the `p` (paste) for? – Ari Feb 25 '21 at 02:53
  • @Ari: `p` is for "print". Actually you can remove that, because it is the default command. Not sure what you mean with "why does my Vim highlight multiple separate occurrences". Vim should highlight/print duplicate lines. – Krisztián Balla Feb 25 '21 at 08:46
  • Tried using `vim -u NONE` and it opens a split pane with the results. I understand what you mean by `print` in this context. One of my plugins shows this https://imgur.com/a/6Qu8Q65 with highlighted rows. On my other Vim instance it didn't show the bottom split and only highlighted rows instead. – Ari Feb 25 '21 at 12:10
  • @Ari my Vim is also opening a pane for the results. Not sure why. I'm unfortunately not an expert in configuring Vim. – Krisztián Balla Feb 27 '21 at 09:57
18
  1. :sort and save it in file1.
  2. :sort u and save it in file2.
  3. gvimdiff or tkdiff the two files.
Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146
user7989979
  • 181
  • 1
  • 2
4

Why not use:

V*

in normal mode.

It simply searches all matches of current line, thus highlighting them (if the setting is enabled, which I think it's the default) Besides, you can then use

n

To navigate through the matches

Lonecat
  • 2,697
  • 2
  • 17
  • 15
  • 1
    Visual mode doesn't support * by default. It's probably a function you have in your .vimrc. Something like this: xno * :calVisualSearch()/ xno # :calVisualSearch()? fun! s:VisualSearch() let old = @" | norm! gvy let @/ = '\V'.substitute(escape(@", '\'), '\n', '\\n', 'g') let @" = old endf) – Michael Aug 17 '09 at 06:11
  • Arg, the formatting messed up. Here's what I meant: http://pastebin.com/f2ee37c92 – Michael Aug 17 '09 at 06:14
  • It would only match one thing at a time, whereas I'd prefer to indicate all lines that are duplicates of other lines all at once. Nice function though, seems handy. – Brian Carper Aug 17 '09 at 19:37
2

Run through the list once, make a map of each string and how many times it occurs. Loop through it again, and append your * to any string that has a value of more than one in the map.

Justin
  • 9,419
  • 7
  • 34
  • 41
2

Try:

:%s:^\(.\+\)\n\1:\1*\r\1:

Hope this works.

Update: next try.

:%s:^\(.\+\)$\(\_.\+\)^\1$:\1\r\2\r\1*:
Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110
  • This will only detect adjacent duplicate lines, and will only mark the first copy, not the second. – rampion Aug 13 '09 at 13:20