4

I have 430 HTML files of different organization's contact us web pages, I was given this files to extract emails from.

This regex simple code I came up with detects and finds emails throughout the files

\S*@\S*

My Problem

I'm trying to select everything besides the emails so I can use Notepad++'s "Replace All in All Opened Documents" function to delete everything besides the emails. Is this possible with regular expressions?

Is there anyway I can select everything outside of the regular expression provided above?

narendra-choudhary
  • 4,582
  • 4
  • 38
  • 58
  • *Find what*: `(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)|.`. *Replace with*: `$1`. Then, you might want to use *Edit -> Blank Operations -> Remove Unnecessary Blank and EOL* menu option. – Wiktor Stribiżew Jul 26 '16 at 21:26
  • I do not know your level of regex knowledge :) so that I can only suggest doing all lessons at [regexone.com](http://regexone.com/), reading through [regular-expressions.info](http://www.regular-expressions.info), [regex SO tag description](http://stackoverflow.com/tags/regex/info) (with many other links to great online resources), and the community SO post called [What does the regex mean](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean). [Rexegg.com](http://rexegg.com) is also cool. – Wiktor Stribiżew Jul 27 '16 at 21:53

2 Answers2

3

Make sure you have a recent version of Notepad++ installed to have the necessary regex support:

Find what : (^|\s+)[^@]+(\s+|$)
Replace with : \n
Regular expression    

The . matches newline option does not influence the action.

trincot
  • 317,000
  • 35
  • 244
  • 286
1

You need to remove all text that does not match some pattern.

You need to match and capture the emails with a (...) capture group and then you need to just match everything else.

Use a pattern like this: ( + your_pattern + )|., and replace with $1.

Or, use:

([^\s<>"]+@[^\s<>"]+)|.

or

(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)|.

Replace with: $1

Then, you might want to use Edit -> Blank Operations -> Remove Unnecessary Blank and EOL menu option.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563