i need to extract href from html documents. most of them has one href so the regex i have solve it but when i have more (following example) , i get the wrong one (email address). Is there a way to extract the href that is not contains email address templates and only starts with 'http://...' ?
The regex i'm using is:
<a\s+(?:[^>]*?\s+)?href={"}([^ {"}]*){"}
The 2 href i have are (need the first one):
<a style='color: black; text-decoration: none; border: 2px solid black; padding: 13px; width: 220px; display: block; text-align: center; margin: 20px 0; font-size: 15px; font-weight: bold;' href='http://ggg.gggg.com/ls/click?upn=ggg'>Verify my account</a>
<a href="mailto:noreply@ggg.com">noreply@ggg.com</a>