I have a series of textfiles formatted as follows:
text = 'COMPANY NAME: Ruff name of company TYPE OF EVENT: Party NOTIFIED DATE: 1/27/20 COMPANY NAME: Company2/CPT TYPE OF EVENT: Fire NOTIFIED DATE: 1/31/20'
I eventually need to get these into a pandas dataframe where COMPANY NAME
, TYPE OF EVENT
, NOTIFIED DATE
are the column headers and the text in between fill up rows. A first step is just to figure out how to split the text wherever there is a ":" preceded by one or more all caps words. So, some output like:
res = ['COMPANY NAME', 'Ruff name of company', 'TYPE OF EVENT', 'PARTY', etc]
I am very new to regex and cannot figure out how to get this match to work. I tried the following:
re.findall('[A-Z]+[A-Z]+[A-Z]', text)
I recognize I'm not even close. I have also looked at lots of other similar questions and failed to adapt them to my use case.
Other posts:
Capture all consecutive all-caps words with regex in python?
Python Regex catch multi caps words and adjacent words
Find the line with all caps in Regex Python
Any help would be appreciated, thanks!