I want to use regex in ruby to capture the plain text email address but NOT the email address surrounded by mailto link tags (like <a href="" class="" >a@b.com</a>
), tried source.gsub(/(?!<$)[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i)
but this does not work
Asked
Active
Viewed 75 times
0
-
Are you parsing HTML? Why not use an HTML parser and simply grab the text content of that element? – Mark Thomas Jun 07 '17 at 02:21
-
Are you aware that you [can't parse HTML with a regexp](https://stackoverflow.com/a/1732454/2483313), because HTML isn't a regular language? – spickermann Jun 07 '17 at 06:32
1 Answers
0
/(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-])(?!<\/a>)*$/i
So something like source.gsub(/(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-])(?!<\/a>)*$/i)
Here's the regex statement, I took the liberty of using a different e-mail selector as well.
(?!<\/a>)*$
basically says ignore it if this ends in </a>
. It might be more efficient though to just filter out any <a></a>
tags first if you're expecting multiple email addresses per line / document.

OneNeptune
- 883
- 11
- 20
-
Thank for your answer, unfortunately, the regex doesn't capture either email address end with tag or the plain text one... – shawn Jun 07 '17 at 01:50