-4

I've already read this and this and this and lots of others. They don't answer to my problem.

I'd like to filter a string that may contain emails or strings starting by "@" (like emails but without the text before the "@"). I've tested many ones but one of the simplest that begins to get close is:

import re
re.split(r'(@)', "test @aa test2 @bb @cc t-es @dd-@ee, test@again")
Out[40]: 
['test ', '@', 'aa test2 ', '@', 'bb ', '@', 'cc t-es ', '@', 'dd-', '@', 'ee, test', '@', 'again']

I'm looking for the right regexp that could give me:

['test ', '@aa', 'test2 ', '@bb ', '@cc', 't-es ', '@dd-', '@ee', 'test@again']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Olivier Pons
  • 15,363
  • 26
  • 117
  • 213

2 Answers2

1

Why try to split when you can go "yo regex, give me all that matches":

test = "test @aa test2 @bb @cc t-es @dd-@ee, test@again"


import re

print(
    re.findall("[^\s@]*?@?[^@]* |[^@]*@[^\s@]*", test)
)
# ['test ', '@aa test2 ', '@bb ', '@cc t-es ', '@dd-', '@ee, ', 'test@again']

I tried but I couldn't make the regex any smaller, but at least it works and who expects regex to be small anyway


As per the OP's new requirements(or corrected requirements)

[^\s@]*?@?[^\s@]* |[^@]*@[^\s@]* 
Işık Kaplan
  • 2,815
  • 2
  • 13
  • 28
  • You are on the right path, but I was wrong in my question, I've edited it, could you check it again? Thank you very much! – Olivier Pons May 24 '19 at 11:27
  • You need to decide `'@aa test2 '` you either get this, or `'@cc', 't-es '` this, if we are gonna split on the white space like the second, first one would get split too, if we want them together, the second one would be together too. – Işık Kaplan May 24 '19 at 11:29
  • OMG I was wrong again, sorry, I've edited it, could you check it again? Thank you very much! – Olivier Pons May 24 '19 at 11:33
  • I'm almost there: `re.findall("[^@\s]*@[^\[@!\?\]\s]*", "test. @aa test2 t-es @dd-@ee, @ff. @ff! test@again.com ns@gmail.com")` but I want to stop before unwanted chars like `!` or `?` – Olivier Pons May 24 '19 at 11:36
  • It matches all you want, what is the problem? do you want match the `!` too or what? – Işık Kaplan May 24 '19 at 11:39
  • it's the contrary: i want to match only good email chars, and *even though* I've excluded "by hand" the `!` and the `?` I think this is not a good solution. The idea would be: "you keep emails through a good regexp *or* `@xxx` too" – Olivier Pons May 24 '19 at 11:51
  • It is hard to re-write the regex every time you want to add a new qualifier. Instead you should've started with this, `I want emails, and emails without the part before @ symbol` Anyway, give this `[\w.+-]*@[\w.+-]*` a shot – Işık Kaplan May 24 '19 at 11:57
  • I've checked your answer it's good I check it as valid, and I've posted mine as well. Have a good day! – Olivier Pons May 24 '19 at 13:07
0

My own solution based on different email parsing + simple "@[:alphanum:]+" parsing is:

USERNAME_OR_EMAIL_REGEX = re.compile(
    r"@[a-zA-Z0-9-]+"  # simple username
    r"|"
    r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"  # email 
    r"@"  # following: domain name:
    r"[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
    r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)")
Olivier Pons
  • 15,363
  • 26
  • 117
  • 213