I have a re.findall() searching for a pattern in python, but it returns some undesired results and I want to know how to exclude them. The text is below, I want to get the names, and my statement (re.findall(r'([A-Z]{4,} \w. \w*|[A-Z]{4,} \w*)', text)
) is returning this:
'ERIN E. SCHNEIDER',
'MONIQUE C. WINKLER',
'JASON M. HABERMEYER',
'MARC D. KATZ',
'JESSICA W. CHAN',
'RAHUL KOLHATKAR',
'TSPU or taken',
'TSPU or the',
'TSPU only',
'TSPU was',
'TSPU and']
I want to get rid of the "TSPU" pattern items. Does anyone know how to do it?
JINA L. CHOI (NY Bar No. 2699718)
ERIN E. SCHNEIDER (Cal. Bar No. 216114) schneidere@sec.gov
MONIQUE C. WINKLER (Cal. Bar No. 213031) winklerm@sec.gov
JASON M. HABERMEYER (Cal. Bar No. 226607) habermeyerj@sec.gov
MARC D. KATZ (Cal. Bar No. 189534) katzma@sec.gov
JESSICA W. CHAN (Cal. Bar No. 247669) chanjes@sec.gov
RAHUL KOLHATKAR (Cal. Bar No. 261781) kolhatkarr@sec.gov
- The Investor Solicitation Process Generally Included a Face-to-Face Meeting, a Technology Demonstration, and a Binder of Materials [...]