I am using python to extract Emails from web using re
library. it does its job but it extracts links that match the pattern. For example:
/images/paramproofs/services/pgp/logo_black_16@2x.png
/images/paramproofs/services/twitter/logo_black_16@2x.png
/images/paramproofs/services/github/logo_black_16@2x.png
/images/paramproofs/services/reddit/logo_black_16@2x.png
/images/paramproofs/services/web/logo_black_16@2x.png
/images/paramproofs/services/web/logo_black_16@2x.png
/images/paramproofs/services/stellar/logo_black_16@2x.png
/images/badges/install-badge-windows-168-56@2x.png
/images/badges/install-badge-windows-168-56@3x.png
This is the pattern I use:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[ a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])