1

I looked around and only managed to get this: \b(\w+)\b([\w\W]*)\b\1\b, substitute with: $1$2.

However, it only woks by removing words, like if you have:

word1, word2, word1, word2, word3
*you get:*
word1, word2, word3

What I want is if you have:

"i love you","i love you too", "i love you", "i love you so much"

I should get:

"i love you","i love you too", "i love you so much"
dda
  • 6,030
  • 2
  • 25
  • 34
lobjc
  • 2,751
  • 5
  • 24
  • 30

1 Answers1

2

You have a regex that matches a whole word, then any 0+ chars up to the last occurrence of the whole word captured in Group 1.

You now need a regex where a word boundary should be replaced with ", and the \w pattern must be replaced with [^"] (not "). Additionally, an optional comma and whitespaces can be matched.

Find what: ("(?!\s*,\s*")[^"]+")(.*)\1,?\s*
Replace with: $1$2
. matches newline option must be ON if your dupes may appear across multiple lines.

The (?!\s*,\s*") negative lookahead will fail all ", " like matches, so as not to remove the field delimiters.

You will need to click Replace All several times to remove all dupes.

See an example screen where "he loves you", and "i love you", are removed.

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks. I note that when I click 'replace all' several times, it deletes ALL strings up to the the first string in the text. Meaning that the first string is the only one that remains e.g. if I have: 'i love you','i love you all','i love you','i love you too',...the only string that will remain will be 'i love you' – lobjc Dec 29 '16 at 08:30
  • One question to clarify: are you having *words* inside double quotes and are these quotes paired? You might actually need word boundaries, as in `("\b[^"]+\b")(.*)\1,?\s*` – Wiktor Stribiżew Dec 29 '16 at 08:33
  • Or, there are two other approaches: 1) if you have no commas inside `""`, add the comma to the negated character class - `("[^,"]+")(.*)\1,?\s*`. 2) Make sure you do not match `"` if it is followed with `,"` - `("(?!\s*,\s*")[^"]+")(.*)\1,?\s*` – Wiktor Stribiżew Dec 29 '16 at 08:42
  • I am using EditPad and found out that I had accidentally clicked on line option, see picture...could this have contributed to that? – lobjc Dec 29 '16 at 08:52
  • 1
    No, the issue is that the current regex matches `", "`. I suggest using `("(?!\s*,\s*")[^"]+")(.*)\1,?\s*` to avoid such matches. Ouroborus's approach with `[^,"]` has 1 fault: it won't match quoted substrings with commas, my approach with the negative lookahead will allow matching those ones, too. – Wiktor Stribiżew Dec 29 '16 at 08:54
  • ...cant insert image – lobjc Dec 29 '16 at 08:55