0

I'm trying isolate the value that comes after "+" sign in an email. For example, if I have "something+company@gmail.com", I want to get the value you "company". It seems like the + sign kind of messes up the regex and I don't know where to go from here.

Here is what I wrote using re:

re.findall(r'something+(.*?)@',st)
inertia
  • 3,997
  • 2
  • 17
  • 26

2 Answers2

2

+ acts like a special character (a repetition operator) when defining a regular expression. You need \ to escape it:

>>> st = "something+company@gmail.com"
>>> re.findall(r'something\+(.*?)@', st)
["company"]
Ozgur Vatansever
  • 49,246
  • 17
  • 84
  • 119
1

The problem with your regexp is that + is a special character, meaning "repeat the previous character one or more times", in your case, it would match g one time, and then the (.*?) would match the literal +.

The solution is to escape the + by preceding it with a \:

>>> email = 'something+company@gmail.com'
>>> re.findall(r'something\+(.*?)@', email)
['company']

Having said that, you don't really need a regular expression here.

Your goal is to get all text between the first + and the first @, which you can do with:

>>> email = 'something+company@gmail.com'
>>> email[email.find('+')+1:email.find('@')]
'company'

Note that this code will give unexpected results if there's no + or @, so you'll probably want to add a check around this (e.g. if '+' in email: ...).

In addition, you can actually have quoted @s and such in emails, so this is not 100% RFC-compliant. However, last time I checked many MTAs and email clients don't support that anyway, so it's not really something you need to worry about as such.

Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146