Regular expression for an email address not end with

Question

I want to use regex in ruby to capture the plain text email address but NOT the email address surrounded by mailto link tags (like <a href="" class="" >a@b.com</a>), tried source.gsub(/(?!<$)[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i) but this does not work

Are you parsing HTML? Why not use an HTML parser and simply grab the text content of that element? — Mark Thomas, Jun 07 '17 at 02:21
Are you aware that you [can't parse HTML with a regexp](https://stackoverflow.com/a/1732454/2483313), because HTML isn't a regular language? — spickermann, Jun 07 '17 at 06:32

score 0 · Answer 1 · answered Jun 07 '17 at 01:33

0

/(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-])(?!<\/a>)*$/i

So something like source.gsub(/(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-])(?!<\/a>)*$/i)

Here's the regex statement, I took the liberty of using a different e-mail selector as well.

(?!<\/a>)*$ basically says ignore it if this ends in </a>. It might be more efficient though to just filter out any <a></a> tags first if you're expecting multiple email addresses per line / document.

answered Jun 07 '17 at 01:33

OneNeptune

883
11
20

Thank for your answer, unfortunately, the regex doesn't capture either email address end with tag or the plain text one... – shawn Jun 07 '17 at 01:50

Regular expression for an email address not end with

1 Answers1