Extracting words between two strings in ProperCase and Line breaks

Asked Jun 10 '20 at 18:49

Active Jun 10 '20 at 19:03

Viewed 22 times

I have the following dataframe sorry for the mess (it was scraped from a website)

df = pd.DataFrame({'TEXT': ['Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Shape:\n \n \n Sliced\n \n \n \n \n Part:\n \n \n Fillet\n \n \n','Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Freezing Process:\n \n \n IQF\n \n \n \n \n Shape:\n \n \n Block\n \n \n \n \n Part:\n \n \n Body\n \n \n \n \n Certification:\n \n \n BRC, FDA, HACCP\n']})

and I want to extract the different parameters. For example, I would like the output to be

df['ProductType']="Fish"

I tried this:

df['ProductType']=df['TEXT'].str.extract("(?=Type\:)(.*)(?=Variety\:)").astype(str)

but it justs outputs NaNs. Sorry if it's too obvious, I'm starting with regex today

edited Jun 10 '20 at 19:03

asked Jun 10 '20 at 18:49

Gustavo Moreno

Can you show your expected output? Preferably use a more obvious and less noisy input string that still represents your data, if possible. – ggorlen Jun 10 '20 at 18:53
1

sure, I'll fix that. thanks! – Gustavo Moreno Jun 10 '20 at 19:01
1

`"(?s)Type:(.*?)Variety:"` – Wiktor Stribiżew Jun 10 '20 at 19:05

Extracting words between two strings in ProperCase and Line breaks

0 Answers0