How to split email with python regex?

Question

I'm trying isolate the value that comes after "+" sign in an email. For example, if I have "something+company@gmail.com", I want to get the value you "company". It seems like the + sign kind of messes up the regex and I don't know where to go from here.

Here is what I wrote using re:

re.findall(r'something+(.*?)@',st)

See http://ideone.com/Jv3hGx and https://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions — Wiktor Stribiżew, May 30 '17 at 21:13

score 2 · Accepted Answer · answered May 30 '17 at 21:14

2

+ acts like a special character (a repetition operator) when defining a regular expression. You need \ to escape it:

>>> st = "something+company@gmail.com"
>>> re.findall(r'something\+(.*?)@', st)
["company"]

answered May 30 '17 at 21:14

Ozgur Vatansever

49,246
17
84
119

Martin Tournoij · Answer 2 · 2017-05-30T21:17:21.650

1

The problem with your regexp is that + is a special character, meaning "repeat the previous character one or more times", in your case, it would match g one time, and then the (.*?) would match the literal +.

The solution is to escape the + by preceding it with a \:

>>> email = 'something+company@gmail.com'
>>> re.findall(r'something\+(.*?)@', email)
['company']

Having said that, you don't really need a regular expression here.

Your goal is to get all text between the first + and the first @, which you can do with:

>>> email = 'something+company@gmail.com'
>>> email[email.find('+')+1:email.find('@')]
'company'

Note that this code will give unexpected results if there's no + or @, so you'll probably want to add a check around this (e.g. if '+' in email: ...).

In addition, you can actually have quoted @s and such in emails, so this is not 100% RFC-compliant. However, last time I checked many MTAs and email clients don't support that anyway, so it's not really something you need to worry about as such.

edited May 30 '17 at 21:17

answered May 30 '17 at 21:11

Martin Tournoij

26,737
24
105
146

1

what's wrong wtih `email.split("@")[0].split("+")[-1]`? – Ozgur Vatansever May 30 '17 at 21:12
Nothing as such @OzgurVatansever. I would consider using `find` more readable, but that's entirely a subjective judgement call. – Martin Tournoij May 30 '17 at 21:14
yeah email.split("@")[0].split("+")[-1] works but can it be done via re? – inertia May 30 '17 at 21:14
1

Yes, I've updated my answer @jake. I don't think that using regexps are a good solution, though. As you've found out they can be hard to understand and debug. – Martin Tournoij May 30 '17 at 21:18
Yeah I agree. I was just practicing but was so curious. Thanks for your input my man. – inertia May 30 '17 at 21:20

How to split email with python regex?

2 Answers2