0

So Ive been trying to figure out a way to use regex(regular expression) to remove duplicate emails from a text file I have but I cant get anything working at all.

This is how the emails are in the text file (an example)

examp@asdas.com
kork@kruu.com
gexx@moxx.com
hey@hayhay.cu
examp@asdas.com
geexx@modxx.com

I havent found a way to delete all duplicates, I only found a way in regular expressions to delete duplicates that are right AFTER each other.

Does anyone have any suggestions?

hennessy
  • 321
  • 1
  • 7
  • 14
  • You will find a helpful answer at [removing-duplicate-rows-in-notepad++](http://stackoverflow.com/a/3958364/1521627) – Tuna Jun 05 '13 at 15:36

1 Answers1

0

How about:

search : ([^@]+@[^@]+)(.*?)\1
replace by : $1$2

Regex explain:

The regular expression:

(?-imsx:([^@]+@[^@]+)(.*?)\1)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^@]+                    any character except: '@' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    @                        '@'
----------------------------------------------------------------------
    [^@]+                    any character except: '@' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \1                       what was matched by capture \1
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Toto
  • 89,455
  • 62
  • 89
  • 125