241

Say you have the following text:

abc
123
abc
456
789
abc
abc

I want to remove all "abc" lines and just keep one. I don't mind sorting. The result should be like this:

abc
123
456
789
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Younes
  • 2,863
  • 3
  • 15
  • 10

10 Answers10

393

If the order of lines is not important

Sort lines alphabetically, if they aren't already, and perform these steps:
(based on this related question: How do I find and remove duplicate lines from a file using Regular Expressions?)

  1. Control+F

  2. Toggle "Replace mode"

  3. Toggle "Use Regular Expression" (the icon with the .* symbol)

  4. In the search field, type ^(.*)(\n\1)+$

  5. In the "replace with" field, type $1

  6. Click the Replace All button ("Replace All").

If the order of lines is important so you can't sort

In this case, either resort to a solution outside VS Code (see here), or - if your document is not very large and you don't mind spamming the Replace All button - follow the previous steps, but in steps 4 and 5, enter these:
(based on Remove specific duplicate lines without sorting)

Caution: Blocks for files with too many lines (1000+); may cause VS Code to crash; may introduce blank lines in some cases.

  • search: ((^[^\S$]*?(?=\S)(?:.*)+$)[\S\s]*?)^\2$(?:\n)?

  • replace with: $1

and then click the "Replace All" button as many times as there are duplicate occurrences.

You'll know it's enough when the line count stops decreasing when you click the button. Navigate to the last line of the document to keep an eye on that.

Community
  • 1
  • 1
Marc.2377
  • 7,807
  • 7
  • 51
  • 95
  • 14
    `((^[^\S\r\n]*?(?=\S)(?:.*)+$)[\S\s]*?)^\2$(?:\r?\n)?` made my vscode crash.... I did a Find in one file 229 lines. :( – Hickory420 Jul 20 '18 at 06:23
  • @Hickory420 I tested in my machine with 1000 lines (20-char long, random) and got no crash, but indeed a thread blocks with 100% cpu load for a few seconds at each pass. Yeah, this is hardly pratical for large files. – Marc.2377 Jul 20 '18 at 09:00
  • Thanks for this. Can you please explain the regex `^(.*)(\n\1)+$`. After removing duplicate rows I want to look at all rows with duplicate first column in the csv and want to modify the regex. – Urvah Shabbir Mar 24 '20 at 07:19
  • 1
    Wow I feel like I'm pretty good at regex and this still blew my mind, great answer!! – derpedy-doo Apr 14 '20 at 21:52
  • @UrvahShabbir, an explanation for that piece of regex is given in the [linked Q&A](https://stackoverflow.com/a/1573425/3258851). Mine is only different in that the `\r?` bit from the other answer is not really necessary. – Marc.2377 Jun 15 '20 at 22:07
  • 2
    It's crazy that we have to do something that complicated when it's a basic "edit > remove duplicates" in Sublime... – dhokas Nov 16 '20 at 14:27
  • 2
    Wow using regex capture groups for replacement is so useful! Thanks – melMass Feb 04 '21 at 15:04
  • this answer really means that the real answer is use sublime text – NOP da CALL Jan 21 '23 at 07:21
  • @NOPdaCALL Back when I wrote this there wasn't an extension nor a native feature that allowed this. Now, the good thing about it is that the idea is transposable to any editor or ambient with regex support. – Marc.2377 Jan 22 '23 at 06:47
255

Coming in vscode v1.62 is a command to eliminate duplicate lines from a selection:

Delete Duplicate Lines

Delete Duplicate Lines in the Command Palette

or

editor.action.removeDuplicateLines as a command in a keybinding

(there is no default keybinding for this command)


Here is a very interesting extension: Transformer

Features:

  • Unique Lines As New Document
  • Unique Lines

  • Align CSV
  • Align To Cursor
  • Compact CSV
  • Copy To New Document
  • Count Duplicate Lines As New Document
  • Encode / Decode
  • Filter Lines As New Document
  • Filter Lines
  • Join Lines
  • JSON String As Text
  • Lines As JSON String Array
  • Normalize Diacritical Marks
  • Randomize Lines
  • Randomize Selections
  • Reverse Lines
  • Reverse Selections
  • Rotate Backward Selections
  • Rotate Forward Selections
  • Select Highlights
  • Select Lines
  • Selection As JSON String
  • Sort Lines By Length
  • Sort Lines
  • Sort Selections
  • Split Lines After
  • Split Lines Before
  • Split Lines
  • Trim Lines
  • Trim Selections

Unique Lines

Removes duplicate lines from the document Operates on selection or current block if no selection

Unique Lines As New Document

Unique lines are opened in a new document Operates on selection or current block if no selection

I haven't played with it much besides the "Unique Lines" command but it seems quite nicely done (including attempting a macro recorder!).

Wok
  • 4,956
  • 7
  • 42
  • 64
Mark
  • 143,421
  • 24
  • 428
  • 436
  • 1
    @ArenCambre I see `Delete Duplicate Lines`: `editor.action.removeDuplicateLines` in the Keyboard Shortcuts. They forgot to do what they need to do to get it into the Command Palette or a regression but it is there in the Keyboard Shortcuts and can be made into a keybinding as per usual. Did you check the Keyboard Shortcuts? – Mark Jan 31 '22 at 03:15
  • Mentioned in the .v1.62 Release Notes: https://code.visualstudio.com/updates/v1_62. – Mark Jan 31 '22 at 03:28
  • It is there. My bad. I've deleted my original comment to prevent confusion. – Aren Cambre Jan 31 '22 at 14:06
  • @ArenCambre It should appear in the Command Palette as well, it does in the Insiders Build. So I assume that is fixed and will be there when another Stable is released shortly. – Mark Jan 31 '22 at 15:39
  • My original, now-deleted comment was a mistake. It appears as expected in 1.63.2. – Aren Cambre Jan 31 '22 at 20:23
  • Works great in 1.64.2, thanks! This should be the accepted answer. – moraleboost Mar 18 '22 at 20:42
  • 2
    Thanks for mentioning "from a selection". I was wondering why nothing was happening… – mpavey Mar 17 '23 at 15:36
53

To add to @Marc.2377 's reply.

If the order is important and you don't care that you just keep the last of the duplicate lines, simply search for the following regexp if you want to only remove duplicte non-empty lines

^(.+)\n(?=(?:.*\n)*?\1$)

If you also want to remove duplicate empty lines, use * instead of +

^(.*)\n(?=(?:.*\n)*?\1$)

and replace with nothing.

Screenshot of filled search-and-replace box

This will take a line and try to find ahead some more (maybe 0) lines followed by the exact same line taken. It will remove the taken line.

This is just a one-shot regex. No need to spam the replace button.

This now also takes the comment of @awk into account, in where the last line has to have a linefeed in order to be identified as a duplicate. This is no longer the case now by excluding the \n from the line to search and adding a $ to the line found.

Skeeve
  • 7,188
  • 2
  • 16
  • 26
  • 1
    Nicely succinct – angus l Mar 07 '19 at 11:05
  • 5
    Nice. I recommend `^(.+\n)(?=(?:.*\n)*?\1)` instead because your regex removed an empty line where it wasn't expected to. Upvoted anyway. – Marc.2377 Mar 10 '19 at 19:46
  • 2
    Good catch… OTOH: duplicate empty lines are also duplicates ;) – Skeeve Mar 11 '19 at 11:31
  • 1
    Thanks to [zaman](https://stackoverflow.com/users/4870357) there is now a screenshot of the search-and-replace box. He also changed the regexp to ignore empty lines. – Skeeve Apr 21 '20 at 05:58
  • 2
    @Skeeve Come on, this just a little thanks for ur helpful answer & All for better community :) – Zaman Jun 23 '20 at 15:49
  • 1
    Can someone please explain the `(?=(?:.*\n)*?` part in the regex. – TrigonaMinima Jul 22 '20 at 18:22
  • 3
    `xxx(?=…)` is a lookahead-match. So it makes sure that, whatever follows "xxx" matches "…", but does not advance the search. `(?:…)` is just a bracket which does not count in the bracket count. `.*\n` is a pattern for a (possibly empty) line. `*` means that there may be as several lines, even none. The `?` after the asterisk (`*`) means that we want as few lines as possible. As `\1` follows this expression the effect is that we look ahead for all the lines which do not match `\1` until we find a line matching `\1`. I hope this makes it clear. – Skeeve Jul 23 '20 at 06:44
  • 2
    This answer worked for me. Just remember before search, to give an empty line at the very end, otherwise doesn’t catch the last line match – Ax_ Mar 21 '22 at 03:24
  • 1
    Thanks @awk for finding that and commenting about it. I changed the regex accordingly so you don't need to take care of the empty line at the end. – Skeeve Mar 21 '22 at 07:32
32

I just had the same issue and found the Visual Studio Code package "Sort lines". See the Visual Studio Code market place for details (e.g. Sort lines).

This package has the option "Sorting lines (unique)", which did it for me. Take care of any white spaces at the beginning/end of lines. They influence whether lines are considered unique or not.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
SimonAx
  • 1,176
  • 1
  • 8
  • 32
  • 2
    https://marketplace.visualstudio.com/items?itemName=bibhasdn.unique-lines should also work. – kcpr May 20 '17 at 19:26
  • 1
    It seems like the extension no longer has the ability to remove duplicate entries. Combining it with [the answer](https://stackoverflow.com/a/45829605/31532) by @Marc-2377 seems to do the trick for me. – Dan Atkinson Oct 02 '17 at 20:29
26

Install the DupChecker extension, hit F1, and type "Check Duplicates".

It will check for duplicates and ask if you want to remove them.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
perfecto25
  • 772
  • 9
  • 13
22

Try find and replace with a regular expression.

  • Find: ^(.+)((?:\r?\n.*)*)(?:\r?\n\1)$

  • Replace: $1$2

It is possible to introduce some variance in the first group.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Lavock
  • 341
  • 3
  • 7
5

If you don't mind some Vim in your VS Code. You can install Vim emulation plugin.

Then you can use vim commands

:sort u

It will sort lines and it will remove duplicates

2

Sublime Text 3

It has blisteringly fast native permutation functions.

  • Edit > Permute Lines > Unique or ⇧⌘U, and
  • Edit > Permute Selections > Unique

Visual Studio Code is my daily driver. But, I keep Sublime Text on standby for these situations.

ssent1
  • 651
  • 5
  • 4
0

To remove the duplicate lines in Visual Studio Code:

  1. Select entire the text.

  2. Press: Ctrl + Shift + P on Windows and Linux

    Command + Shift + P on macOS

  3. Type Delete Duplicate Lines and select the option. It will filtered the duplicate line and give it a unique text.

Ram Chander
  • 1,088
  • 2
  • 18
  • 36
-3

Not actually in Visual Studio Code, but if it works, it works.

  1. Open a new Excel spreadsheet
  2. Paste the data into a column
  3. Go to the Data tab
  4. Select the column of data (if you haven't already)
  5. Click Remove Duplicates (somewhat in the middle of the bar)
  6. Click OK to remove duplicates.

It is not the best answer, as you specified Visual Studio Code, but as I said: If it works, it works :)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
NostraDavid
  • 195
  • 8
  • You could make it more relevant by [providing a script](https://superuser.com/questions/110991/can-you-zip-a-file-from-the-command-prompt-using-only-windows-built-in-capabili/111266#111266) that can be called directly from Visual Studio Code. In order words, automates this process. I don't know if it is possible, but a script that would invoke Excel through its exposed COM interfaces. This would make this answer much more valuable as it would be an example of leveraging other applications to do neat stuff. – Peter Mortensen Jun 09 '20 at 12:42