I'm trying to parse a string containing a name and a degree. I have a long list of these. Some contain no degrees, some contain one, and some contain multiple.
Example strings:
Sam da Man J.D.
Green Eggs Jr. Ed.M.
Argle Bargle Sr. MA
Cersei Lannister M.A. Ph.D.
As far as I can tell, the degrees come in the following patterns:
x.x.
x.x.x.
x.x.xx.
x.xx.
xx.x.
x.xxx.
two caps (ex: 'MA')
How would I parse this?
I'm new to regex and breaking down this problem has proved very time-consuming. I've been using this post and tried split = re.split('\s+|([.])',s)
and split = re.split('\s+|\.',s)
but these still split on the first space.
I have thought, in response to the first comment, about the degree designations. I've been trying to make a regex that recognizes 'x.x' and then a wildcard afterwards because there are several patterns within the degrees which look like this: x.x(something): x.x. x.x.x. x.x.xx.
and then I'd have a few more to classify.
Alternatively, classifying the name might be easier?
Or even listing the degrees in a collection and searching for them?
{'M.A.T.','Ph.D.','MA','J.D.','Ed.M.', 'M.A.', 'M.B.A.', 'Ed.S.', 'M.Div.', 'M.Ed.", 'RN', 'B.S.Ed.'}