I have the following dataframe sorry for the mess (it was scraped from a website)
df = pd.DataFrame({'TEXT': ['Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Shape:\n \n \n Sliced\n \n \n \n \n Part:\n \n \n Fillet\n \n \n','Product Type:\n \n \n Fish\n \n \n \n \n Variety:\n \n \n Salmon\n \n \n \n \n Style:\n \n \n FROZEN\n \n \n \n \n Freezing Process:\n \n \n IQF\n \n \n \n \n Shape:\n \n \n Block\n \n \n \n \n Part:\n \n \n Body\n \n \n \n \n Certification:\n \n \n BRC, FDA, HACCP\n']})
and I want to extract the different parameters. For example, I would like the output to be
df['ProductType']="Fish"
I tried this:
df['ProductType']=df['TEXT'].str.extract("(?=Type\:)(.*)(?=Variety\:)").astype(str)
but it justs outputs NaNs. Sorry if it's too obvious, I'm starting with regex today