2

I want to find valid email addresses in a text file, and this is my code:

email = re.findall(r'[a-zA-Z\.-]+@[\w\.-]+',line)

But my code obviously does not contain email addresses where there are numbers before @ sign. And my code could not handle email addresses that do not have valid ending. So could anyone help me with these two problems? Thank you!

An example of my problem would be:

my code can find this email: xyz@gmail.com

but it cannot find this one: xyz123@gmail.com

And it cannot filter this email out either: xyz@gmail

Parker
  • 193
  • 2
  • 3
  • 14

3 Answers3

5

From the python re docs, \w matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]. So [\w\.-] will appropriately match numbers as well as characters.

email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)

This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
2

Try the validate_email package.

pip install validate_email

Then

from validate_email import validate_email
is_valid = validate_email('example@example.com')
Harald Nordgren
  • 11,693
  • 6
  • 41
  • 65
1
^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$

Not mine, but I have used it in apps before.

Source

Thomas Harlan
  • 503
  • 4
  • 6