Python regex for capturing comma-separated strings conditionally

Question

I have a list of person names which can have 3 different styles:

{last name}, {first name} {middle name} (Example: Bob, Dylan Tina)
{last name}, {first name} {middle initial}. (Example: Bob, Dylan T.)
{last name}, {first name} (Example: Bob, Dylan)

And this is the regex which I wrote:

^[a-zA-Z]+(([' ,.-][a-zA-Z ])?[a-zA-Z]*)*$

But it doesn't work.

Does this answer your question? [How to capture multiple repeated groups?](https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups) — Ulysse BN, Dec 09 '19 at 07:43

Albin Paul · Accepted Answer · 2019-12-09T07:41:36.313

0

You could write the regex like this

^(\w+),\s(\w+)\s*(\w*\.?)$

Here is the demo.

Update the regex to like this and you can get three different groups for your three cases

^(\w+,\s\w+\s\w+)$|^(\w+,\s\w+\s\w+\.)$|^(\w+,\s\w+)$

Here is the demo.

Here is the python code

import re
s2 = "Bob, Dylan"
out = re.findall(r"^(\w+),\s(\w+)\s*(\w*\.?)$",s2)
print(out)

OUTPUT

[('Bob', 'Dylan', '')]

edited Dec 09 '19 at 07:41

answered Dec 09 '19 at 06:44

Albin Paul

3,330
2
14
30

Is it possible to write regex for each case separately? – Dec 09 '19 at 06:46
@user8714896 Yes it is possible. – Albin Paul Dec 09 '19 at 06:49
This will return `[('Bob', 'Dyla', 'n')]` for `re.findall(p, "Bob, Dylan")`. – Zeinab Abbasimazar Dec 09 '19 at 06:57
@ZeinabAbbasimazar which one are you taking about ? – Albin Paul Dec 09 '19 at 06:58
@ZeinabAbbasimazar It works alright in the demo. Sorry, *talking about, i made a typo above. – Albin Paul Dec 09 '19 at 07:02
I was talking about your first suggested regex Albin; `^(\w+),\s(\w+)\s*(\w*\.?)$`. – Zeinab Abbasimazar Dec 09 '19 at 07:09
@ZeinabAbbasimazar it works fine for me. I just tested it in python script. I don't know the reason why it is not working for you. – Albin Paul Dec 09 '19 at 07:19
What is your python version and the python code that you are using? – Zeinab Abbasimazar Dec 09 '19 at 07:34
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203882/discussion-between-albin-paul-and-zeinab-abbasimazar). – Albin Paul Dec 09 '19 at 07:41

score 0 · Answer 2 · answered Dec 09 '19 at 06:46

You should use this regex:

(\w+),\s*(\w+)\s*(\w{0,}\.*)

This is the result you'll get:

>>> import re
>>> s1 = "Bob, Dylan Tina"
>>> s2 = "Bob, Dylan"
>>> s3 = "Bob, Dylan T."
>>> p = re.compile(r"(\w+),\s*(\w+)\s*(\w{0,}\.*)")
>>> re.findall(p, s1)
[('Bob', 'Dylan', 'Tina')]
>>> re.findall(p, s2)
[('Bob', 'Dylan', '')]
>>> re.findall(p, s3)
[('Bob', 'Dylan', 'T.')]

Python regex for capturing comma-separated strings conditionally

2 Answers2