Slice regex back reference? Nocando?

Question

Names in the form Nelson, Craig T. need to be split into

AN Nelson
FN Craig
IT C.T.

IT means initials, note the first initial is the first letter of FN, first name.

I already have a bunch of patterns in regex. For this one, I suspect regex won't do, the reason being: you can't slice a backreference

import re

name = r'Nelson, Craig T.'
pat = r'([^\W\d_]+),\s([^\W\d_]+\s?)\s(([A-Z]\.?)+)\s?$'
rep = r'AN \1\nVN \2\nsf \3\n'  

split = re.sub(pat, rep, name)
print(split)

will produce:

AN Nelson
FN Craig
IT T.

Ideally I'd somehow slice the \2, add a full stop and stick \3 behind it. I think this is not possible with regex and I should use a string operation, HOWEVER, it wouldn't be the first time I'd learn a trick here that I hadn't deduced from the documentation. (Thanks guys.)

Just a general comment: do you really find `[^\W\d_]` more readable than `[a-zA-Z]`? I have to say I had to think about that character class for a few seconds. ;) — Martin Ender, Apr 18 '13 at 21:22
@m.buettner I think that the answer could be found in an answer to another question of the author: [Python 3 regex with diacritics and ligatures](http://stackoverflow.com/questions/15936315/python-3-regex-with-diacritics-and-ligatures) — Aleksei Zyrianov, Apr 18 '13 at 21:32
@Alexey fair enough... I thought Python's built-in character classes only use Unicode properties if used with the `re.U` modifier. — Martin Ender, Apr 18 '13 at 21:45
@RolfBly Please don't forget to choose the right answer if your question was solved ;) — Aleksei Zyrianov, Apr 24 '13 at 14:31

score 4 · Answer 1 · answered Apr 18 '13 at 21:19

4

You may use one more group for the first initial like this:

pat = r'([^\W\d_]+),\s(([^\W\d_])[^\W\d_]*\s?)\s(([A-Z]\.?)+)\s?$'
rep = r'AN \1\nVN \2\nIT \3.\4\n'

I've also corrected having sf instead of IT for initials in the rep variable.

answered Apr 18 '13 at 21:19

Aleksei Zyrianov

2,294
1
24
32

score 1 · Answer 2 · answered Apr 18 '13 at 21:23

Instead of substituting, play with groups

import re

name = r'Nelson, Craig T.'
pat = r'([^\W\d_]+),\s([^\W\d_]+\s?)\s(([A-Z]\.?)+)\s?$' 
fmt = 'AN {last}\nVN {first}\nsf {initials}\n'

mtch = re.match(pat, name)

last_name, first_name, mid_name = mtch.group(1, 2, 3)

parsed = fmt.format(last=last_name, first=first_name, initials=last_name[0]+'.'+mid_name)
print(parsed)

score 0 · Answer 3 · answered Apr 19 '13 at 00:57

I was going to say O never mind, but you were all faster :-)

import re

name = r'Nelson, Craig T.'
pat = r'([^\W\d_]+),\s(([A-Z])[^\W\d_]+\s?)\s(([A-Z]\.?)+)\s?$'
rep = r'AN \1\nVN \2\nsf \3.\4\n'  

split = re.sub(pat, rep, name)
print(split)

This is just a slight variation on Alexey's suggestion. Here, I'd prefer a real capital for the first letter of first name (VN).

Slice regex back reference? Nocando?

3 Answers3