I have an assignment that requires me to use regular expressions in python to find alliterative expressions in a file that consists of a list of names. Here are the specific instructions: " Open a file and return all of the alliterative names in the file. For our purposes a "name" is a two sequences of letters separated by a space, with capital letters only in the leading positions. We call a name alliterative if the first and last names begin with the same letter, with the exception that s and sh are considered distinct, and likewise for c/ch and t/th.The names file will contain a list of strings separated by commas.Suggestion: Do this in two stages." This is my attempt so far:
def check(regex, string, flags=0):
return not (re.match("(?:" + regex + r")\Z", string, flags=flags)) is None
def alliterative(names_file):
f = open(names_file)
string = f.read()
lst = string.split(',')
lst2 = []
for i in lst:
x=lst[i]
if re.search(r'[A-Z][a-z]* [A-Z][a-z]*', x):
k=x.split(' ')
if check('{}'.format(k[0][0]), k[1]):
if not check('[cst]', k[0][0]):
lst2.append(x)
elif len(k[0])==1:
if len(k[1])==1:
lst2.append(x)
elif not check('h',k[1][1]):
lst2.append(x)
elif len(k[1])==1:
if not check('h',k[0][1]):
lst2.append(x)
return lst2
There are two issues that I have: first, what I coded seems to make sense to me, the general idea behind it is that I first check that the names are in the correct format (first name, last name, all letters only, only first letters of first and last names capitalized), then check to see if the starting letters of the first and last names match, then see if those first letters are not c s or t, if they aren't we add the name to the new list, if they are, we check to see that we aren't accidentally matching a [cst] with an [cst]h. The code compiles but when I tried to run it on this list of names: Umesh Vazirani, Vijay Vazirani, Barbara Liskov, Leslie Lamport, Scott Shenker, R2D2 Rover, Shaq, Sam Spade, Thomas Thing
it returns an empty list instead of ["Vijay Vazirani", "Leslie Lamport", "Sam Spade", "Thomas Thing"] which it is supposed to return. I added print statements to alliterative so see where things were going wrong and it seems that the line if check('{}'.format(k[0][0]), k[1]): is an issue.
More than the issues with my program though, I feel like I am missing the point of regular expressions: am I overcomplicating this? Is there a nicer way to do this with regular expressions?