I'm new to Regex and currently writing a Scrapy crawler to collect e-mail addresses.
I want to be able to select different formats of e-mails when I crawl. Right now I just find anything with an @ sign - but want to be a little bit smarter.
How do I select e-mails with the following formats?
- info@example.com
- info [at] example [dot] com
- info at example.com info
- info at example dot com
Here is what I currently have:
item['mail'] = hxs.select('//body//text()').re(r'[\w.-]+@[\w.-]+')