Use something prebuilt:
require 'uri'
addresses = URI.extract(<<EOT, :mailto)
this is some text. mailto:foo@bar.com and more text
and some more http://foo@bar.com text
href="mailto:someonesname@domain.rr.com"> | Email</a></td>
EOT
addresses # => ["mailto:foo@bar.com", "mailto:someonesname@domain.rr.com"]
URI comes with Ruby, and the pattern used to parse out URIs is well tested. It's not bullet-proof, but it works pretty well. If you're getting false-positives, you can use a select
, reject
or grep
block to filter out the unwanted entries returned.
If you can't count on having mailto:
, the problem becomes harder, because email addresses aren't simple to parse; There's too much variation to them. The problem is akin to validating an email address using a pattern, because, again, the format for addresses varies too much. "Using a regular expression to validate an email address" and "JavaScript Email Validation when there are (soon to be) 1000's of TLD's?" are good reads for more information.