-1

How to find an email address using regex and considering mailto: as a flag. I tried with an expression but it only parses a small portion.

import re
html_content='''
<p><a href="mailto:info@mohindraroto.com">info@mohindraroto.com</a></p>
'''
row = re.findall(r'mailto:(\w*.)',html_content)[0]
print(row)

It gives me:

info@

Any help to modify my existing expression or create a new one for the sake of finding email will be highly appreciated.

SIM
  • 21,997
  • 5
  • 37
  • 109
  • 2
    Here's a [nice little regex](http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) that can find email-addresses. – timgeb Nov 20 '17 at 18:50
  • 1
    [Don't use regex to parse HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags), or email addresses either. – tripleee Nov 20 '17 at 18:52
  • @timgeb, Thanks for the link to your nice little regex. I didn't see any `mailto:` flag used in that expression. – SIM Nov 20 '17 at 18:52
  • Use [this](https://stackoverflow.com/questions/31416858/is-it-possible-to-find-all-elements-with-a-custom-html-attribute-in-beautiful-so) to find all instances of `mailto` and then do your magic. – ctwheels Nov 20 '17 at 18:58

3 Answers3

4

For your example, I would suggest matching a pattern that starts with mailto: followed by any character but double quotes:-

row = re.findall(r'mailto:([^"]*)',html_content)
print(row)
Yoda
  • 435
  • 2
  • 7
2

This (?:.*mailto:)([^"]*)" will work as well. it uses a non-capturing group to find the "mailto: and catptures the text after until the closing " which is not captured.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
0

This will work based on your current example:

'mailto:(\w*.\w*.com)'

This works as long as it is a .com email address

I think the regex stops at the @ because it is a special character and not a word or something along those lines

SPYBUG96
  • 1,089
  • 5
  • 20
  • 38
  • `\w` is not a "word" it is short hand for `a-zA-Z0-9_`. This wouldn't work for an email such as `chris.me@gmail.com`, not `chris-me@gmail.com`. The `.` is loosely matching the `@`. – chris85 Nov 20 '17 at 18:54