0

I have the following dataframe sorry for the mess (it was scraped from a website)

df = pd.DataFrame({'TEXT': ['Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Shape:\n \n \n Sliced\n \n \n \n \n Part:\n \n \n Fillet\n \n \n','Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Freezing Process:\n \n \n IQF\n \n \n \n \n Shape:\n \n \n Block\n \n \n \n \n Part:\n \n \n Body\n \n \n \n \n Certification:\n \n \n BRC, FDA, HACCP\n']})

and I want to extract the different parameters. For example, I would like the output to be

df['ProductType']="Fish"

I tried this:

df['ProductType']=df['TEXT'].str.extract("(?=Type\:)(.*)(?=Variety\:)").astype(str)

but it justs outputs NaNs. Sorry if it's too obvious, I'm starting with regex today

0 Answers0