0

I was trying to remove middle initials from a list of names so that they all conformed to FirstName space LastName. so I tried writing a regular expression that I could then use to match list items that had a middle initial then replace it with '' an empty space.

Here is my code:

import re

list = ['John A Appleseed', 'Bonnie N Clyde', 'Joseph B Barthalomew', 'John Smith']

mid_name = re.compile(r'\s+[A-Z]\s+')

for idx, names in enumerate(list):
    if re.match(mid_name, names) is not None:
        list[idx] = mid_name.sub('', names)

print(list)

My results were:

['John A Appleseed', 'Bonnie N Clyde', 'Joseph B Barthalomew', 'John Smith']

I then changed my regular expression to:

mid_name = re.compile(r'\w+\s+[A-Z]\s+\w+')

And get:

['', '', '', 'John Smith']

Then changed the regular expression to:

mid_name = re.compile(r'[A-Z]\s+')

because I realized I wanted to keep at least one of those spaces anyway, but still got:

['John A Appleseed', 'Bonnie N Clyde', 'Joseph B Barthalomew', 'John Smith']

What is it that I'm missing? I feel like I'm really close to my solution, but it's alluding me. Any assistance would be appreciated.

1 Answers1

1

You're using re.match when you should be using re.search.

According to the documentation, match only matches at the beginning of the string whereas search matches anywhere in the string.

Another thing to note: you don't need to use re.match or re.search when you have a compiled regular expression (made with re.compile). You can do this instead:

mid_name = re.compile(r'\s+[A-Z]\s+')
mid_name.search(name)

You also probably don't need to check for a match before performing a substitution. That extra step is unnecessary because a substitution that doesn't actually substitute will give you the original string. So compile and then sub only (don't search).


Unrelated to your issue: you might want to consider changing your variable names some.

The list name is already used to represent the list data type so you're shadowing that name. You might consider renaming list to names and renaming names to name (since that variable represents just a single name).

Community
  • 1
  • 1
Trey Hunner
  • 10,975
  • 4
  • 55
  • 114