5

I have a .txt document which consists of one word followed up with a date in one line, and so on in each line.

How can Notepad++ recognize same words in different lines and delete duplicate lines?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
xcyteh
  • 67
  • 1
  • 1
  • 9
  • Duplicate of [Removing duplicate rows in Notepad++](http://stackoverflow.com/questions/3958350/removing-duplicate-rows-in-notepad) – florisla Aug 25 '14 at 09:05

4 Answers4

7

Not a direct answer to your question, but I found this article based on the title. I was looking to just delete duplicate lines. I found an easy way to do that here

  1. Mark all the text (CTRL+A). Click TextFX → Click TextFX Tools → Check +Sort outputs only UNIQUE (at column) lines (if not already checked).
  2. Click TextFX → Click TextFX Tools → Click Sort lines case insensitive (at column)
alexjhart
  • 86
  • 1
  • 3
4

Assuming the dates can be different for the same occurrence of the same word and you want to keep the one that appears first in the file then this should work (make sure your file end with a new line for this):

  1. Go to the "Replace" dialog (you can do Ctrl+F and go to replace tab).
  2. In the "Search Mode" at the bottom select "Regular expression" (make sure ". matches newline" is not selected).
  3. In the "Find what:" field type (\s*\w+ )(.*\r\n)((.*\r\n)*)\1.*\r\n
  4. In the "Replace with:" field type \1\2\3
  5. Click "Replace" until there are no more occurrences ("Replace All" does not seem to work for this, and perhaps there exists a better regex for which it will work, but I have not found it).

I've tested this on the file:

testing330     05:09-24/08
whatever     10:55-25/08
testing     15:57-26/08
testing667     19:22-30/08
linux     00:29-31/08
testing330     00:29-31/08
windows     12:25-31/08

And the result was:

testing330     05:09-24/08
whatever     10:55-25/08
testing     15:57-26/08
testing667     19:22-30/08
linux     00:29-31/08
windows     12:25-31/08
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
SamYonnou
  • 2,068
  • 1
  • 19
  • 23
  • This is exactly what I needed, but formatting is a bit different, how would you do it for this example: http://pastebin.com/ZbtGeaTX Note testing330 is the only duplicate, at different time (keep first) – xcyteh Sep 12 '13 at 16:23
  • The method I posted should work for your example as well (assuming each one of those entries are on a separate line). It does not care about the format of the date. However if you are running this on a *nix system it will probably have to be changed a bit (replacing each `\r\n` in the code to just `\n` should do it). – SamYonnou Sep 12 '13 at 16:28
  • edit : fixed to allow for spaces at the beginning of each line. also changed it so it will remove the entire duplicate line and not just the text on it (for this to work properly make sure the file ends in a new/empty line) – SamYonnou Sep 12 '13 at 16:34
  • Is there any way to *not* leave one occurrence? – Cullub Sep 23 '14 at 11:40
2

You can use EditPlus on Windows OR TextWrangler on Mac to sort and remove duplicated lines easy.

After Notepad++ 6.5.2 (free) you can sort lines OR you can install the plugin "TextFX Characters" using the "Plugin Manager".

TextFX includes numerous features to transform selected text. Featuring: * Interactive Brace Matching * Quote handling * Character case alternation * Text rewrap * Column Lineup * Fill Text Down * Insert counter text down * Text to code conversion * Numeric Conversion * URI & HTML encoding * HTML to text conversion * Submit text to W3C * Text sorting * Ascii Chart * Leading whitespace repair * Autoclose HTML & braces Homepage: http://textfx.no-ip.com/textfx/

lynx_74
  • 1,633
  • 18
  • 12
1

For me personally, here are the steps I follow. Let's assume you have only 1 column of data in column A.

  1. Import the data into Excel.
  2. Sort the data.
  3. Insert a function to check for duplicates. Cell B2 would be: =IF(A2=A1,"Duplicate","")
  4. Select all of column B.
  5. Copy.
  6. Paste special and paste the values.
  7. Sort the data according to column B.
  8. Delete all the ones marked with "Duplicate".
  9. Copy the data back to Notepad++

I thought there was a plugin like this, but can't find it now. Otherwise, this link may help you.

  • Using TestFX helped alot, even though I could use excel, copy pasting all documents would take a lot of time. – xcyteh Sep 12 '13 at 16:20