2

I have been using this: (I know, there are probably more efficient ways...)

Given this in an email message:

Submitted data:
First Name: MyName
Your Email Address: email@domain.com
TAG:

I coded this:

intStart = (bodystring.rfind('First ')) + 12
intEnd = (bodystring.rfind('Your Email'))
receiver_name = bodystring[intStart:intEnd]

intStart = (bodystring.rfind('Your Email Address: ')) + 20
intEnd = (bodystring.rfind('TAG:'))
receiver_email = bodystring[intStart:intEnd]

... and got what I needed. This worked because I had the 'TAG' label.

Now I am given this:

Submitted data:
First name: MyName
Last name:
Email: email@domain.com

I'm having a brain block on getting the email address without a next word. There is whitespace. Can someone nudge me in the right direction? I suspect I can dig out the email address after the occurrence of 'Email:' using regex...

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Andy Delgado
  • 31
  • 1
  • 4

3 Answers3

2

You can, in fact, make use of RegEx to extract e-mails.

  • To find single e-mails in a text, you can make use of re.search().group()

  • In case you want to find multiple emails, you can make use of re.findall()

An example

    import re
    text = "First name: MyName Last name: Email: email@domain.com "
    
    email = re.search(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", text)
    print(email.group())
    
    emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", text)
    print (emails)

This would give the output as

email@domain.com
['email@domain.com']
Tharun K
  • 1,160
  • 1
  • 7
  • 20
1

Searching for strings is often better done with splitting, and occasionally regular expressions. So first split the lines:

bodylines = bodystring.splitlines()

Split the resulting lines on the : delimiter (make a generator):

chunks = (line.split(':') for line in bodylines)

Now grab the first one that has "email" on the left and @ on the right:

address = next(val.strip() for key, val in chunks if 'email' in key.lower() and '@' in val)

If you want all the emails across multiple lines, replace next with a list comprehension:

addresses = [val.strip() for key, val in chunks if 'email' in key.lower() and '@' in val]

This can be done in one line with no imports (if you replace chunks with its definition, not that I recommend it). Regex are a much heavier tool that allow you to specify much more general patterns, but are also much slower as a result. If you can get away with simple and effective tools, do it: don't bring in the sledgehammer until you need it!

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

If the email should come after the word Email followed by a :, you could match the Name part, and capture the email in a group with an email like pattern.

\bEmail[^:]*:\s*([^\s@]+@[^\s@]+)
  • \bEmail A word boundary to prevent a partial match, match Email
  • [^:]*:\s* Match optional chars other than :, then match : and optional whitespace chars
  • ( Capture group 1
    • [^\s@]+@[^\s@]+ Match a single @ between 1+ more non whitespace chars ecluding the @ itself
  • ) Close group 1

Regex demo

Example with re.findall that returns the values of the capture groups:

import re
 
regex = r"\bEmail[^:]*:\s*([^\s@]+@[^\s@]+)"
 
s = ("Submitted data:\n"
    "First Name: MyName\n"
    "Your Email Address: email@domain.com\n"
    "TAG:\n\n"
    "Submitted data:\n"
    "First name: MyName\n"
    "Last name:\n"
    "Email: email@domain.com")
 
print(re.findall(regex, s))

Output

['email@domain.com', 'email@domain.com']
The fourth bird
  • 154,723
  • 16
  • 55
  • 70