0

I have been reviewing regex for Python3 to capture an email address. I have seen the doc on the popular regular-expressions website. And read the pydocs, SO answers, etc.

I would like to know if this regex is good enough to capture an email address or if I am missing something.

re.findall(r'[\w.+-]+@[\w.+-]+', some_string)
tomordonez
  • 341
  • 2
  • 15
  • That depends on your definition of "good enough". Validating an email address with regex is hard to get right. – Bryan Oakley Apr 02 '18 at 19:08
  • @BryanOakley most email addresses I have seen are numbers, letters, dash, period, plus, underscore. Not looking to validate a form, email field. But capture from scraping. The answer below said it would fail to capture ' and &. But I haven't seen such characters on an email address before. Good enough is the largest percentage of email addresses but not all. – tomordonez Apr 02 '18 at 19:39
  • 1
    Just be aware that "email addresses I've seen" is not the same as "all valid email addresses". But like I said earlier, it really depends on your definition of "good enough". – Bryan Oakley Apr 02 '18 at 19:43
  • I have once seen a blog post of a guy who claimed to have written a comlete email address regex including IPv6, IDNs and stuff. It was several screen pages long. – Klaus D. Apr 02 '18 at 20:12
  • 1
    @tomordonez Apostrophes are somewhat common in corporate email systems that use address formats like `FIRSTNAME-LASTNAME@example.com`, especially in conjunction with Irish patronymics like `O'Reilly` or `O'Sullivan`. –  Apr 02 '18 at 20:18
  • @duskwuff mm you are right. I didn't think of that :) – tomordonez Apr 02 '18 at 21:47

1 Answers1

0

No. This will fail to match email addresses with certain special characters in the local part. The most common characters this will miss are ' and &, both of which are valid.