0

I am trying to get all email address from a text file using regular expression and Python but it always returns NoneType while it suppose to return the email. For example:

content = 'My email is lehai@gmail.com'
#Compare with suitable regex
emailRegex = re.compile(r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)')
mo = emailRegex.search(content)
print(mo.group())

I suspect the problem lies in the regex but could not figure out why.

  • 1
    Remove anchors: [`r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'`](https://ideone.com/rxDrGq). – Wiktor Stribiżew Aug 03 '15 at 08:54
  • 1
    This is a terrible way to check email address. You might as well check for the presence of `@` only. – nhahtdh Aug 03 '15 at 09:12
  • See also [Using a regular expression to validate an email address](https://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address). – Peter Wood Aug 03 '15 at 10:17

3 Answers3

2

Because of spaces in content; remove the ^ and $ to match anywhere:

([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)

Aziz Alfoudari
  • 5,193
  • 7
  • 37
  • 53
0

Try this one as a regex, but I am completely not sure whether it will work for you:

([^@|\s]+@[^@]+.[^@|\s]+)

Humoyun Ahmad
  • 2,875
  • 4
  • 28
  • 46
  • `|` inside character class is a red flag - are you sure you want to exclude it from the character class? – nhahtdh Aug 03 '15 at 09:12
  • @nhahtdh, As I told, I am not sure that it will work in all cases, it worked on me when I faced the same problem – Humoyun Ahmad Aug 03 '15 at 09:41
  • `|` should not be used to say "alternation" in character class. `[@\s]` is already an alternation of 2 choices `@` or space character class. The extra `|` will add the literal `|` to the list of alternation. – nhahtdh Aug 03 '15 at 09:51
  • @nhahtdh, I just checked again and it worked on 'My email is lehai@gmail.com' perfectly, returning "lehai@gmail.com" – Humoyun Ahmad Aug 03 '15 at 11:24
  • I'm not commenting about the correctness of your regex with respect to the question. I'm saying `|` should not be used as "alternation" in character class. – nhahtdh Aug 03 '15 at 11:33
0

Your regular expression doesn't match the pattern.

I normally call the regex search like this:

mo = re.search(regex, searchstring) 

So in your case I would try

content = 'My email is lehai@gmail.com'
#Compare with suitable regex
emailRegex = re.compile(r'gmail')
mo = re.search(emailRegex, content)
print(mo.group())`

You can test your regex here: https://regex101.com/ This will work:

([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
niallhaslam
  • 282
  • 2
  • 12