2

Names in the form Nelson, Craig T. need to be split into

AN Nelson
FN Craig
IT C.T. 

IT means initials, note the first initial is the first letter of FN, first name.

I already have a bunch of patterns in regex. For this one, I suspect regex won't do, the reason being: you can't slice a backreference

import re

name = r'Nelson, Craig T.'
pat = r'([^\W\d_]+),\s([^\W\d_]+\s?)\s(([A-Z]\.?)+)\s?$'
rep = r'AN \1\nVN \2\nsf \3\n'  

split = re.sub(pat, rep, name)
print(split)

will produce:

AN Nelson
FN Craig
IT T. 

Ideally I'd somehow slice the \2, add a full stop and stick \3 behind it. I think this is not possible with regex and I should use a string operation, HOWEVER, it wouldn't be the first time I'd learn a trick here that I hadn't deduced from the documentation. (Thanks guys.)

RolfBly
  • 3,612
  • 5
  • 32
  • 46
  • 1
    Just a general comment: do you really find `[^\W\d_]` more readable than `[a-zA-Z]`? I have to say I had to think about that character class for a few seconds. ;) – Martin Ender Apr 18 '13 at 21:22
  • 1
    @m.buettner I think that the answer could be found in an answer to another question of the author: [Python 3 regex with diacritics and ligatures](http://stackoverflow.com/questions/15936315/python-3-regex-with-diacritics-and-ligatures) – Aleksei Zyrianov Apr 18 '13 at 21:32
  • @Alexey fair enough... I thought Python's built-in character classes only use Unicode properties if used with the `re.U` modifier. – Martin Ender Apr 18 '13 at 21:45
  • @M.Büttner Alexey is exactly right – RolfBly Apr 19 '13 at 00:48
  • @RolfBly Please don't forget to choose the right answer if your question was solved ;) – Aleksei Zyrianov Apr 24 '13 at 14:31

3 Answers3

4

You may use one more group for the first initial like this:

pat = r'([^\W\d_]+),\s(([^\W\d_])[^\W\d_]*\s?)\s(([A-Z]\.?)+)\s?$'
rep = r'AN \1\nVN \2\nIT \3.\4\n' 

I've also corrected having sf instead of IT for initials in the rep variable.

Aleksei Zyrianov
  • 2,294
  • 1
  • 24
  • 32
1

Instead of substituting, play with groups

import re

name = r'Nelson, Craig T.'
pat = r'([^\W\d_]+),\s([^\W\d_]+\s?)\s(([A-Z]\.?)+)\s?$' 
fmt = 'AN {last}\nVN {first}\nsf {initials}\n'

mtch = re.match(pat, name)

last_name, first_name, mid_name = mtch.group(1, 2, 3)

parsed = fmt.format(last=last_name, first=first_name, initials=last_name[0]+'.'+mid_name)
print(parsed)
J0HN
  • 26,063
  • 5
  • 54
  • 85
0

I was going to say O never mind, but you were all faster :-)

import re

name = r'Nelson, Craig T.'
pat = r'([^\W\d_]+),\s(([A-Z])[^\W\d_]+\s?)\s(([A-Z]\.?)+)\s?$'
rep = r'AN \1\nVN \2\nsf \3.\4\n'  

split = re.sub(pat, rep, name)
print(split)

This is just a slight variation on Alexey's suggestion. Here, I'd prefer a real capital for the first letter of first name (VN).

RolfBly
  • 3,612
  • 5
  • 32
  • 46