I have a dataset with a complex column in pandas. One of column product_info has various types of contents :
#Input type1
df['productInfo'][0]
#Output type1
'Salt & pepper shakers,Material: stoneware,Dimensions:
H6.5cm,Dachshund designs,1x black and tan, 1x brown,Hand
painted,Dishwasher safe'
#Output type2
'Pineapple string lights,Dimensions: 400x6x10cm,10 pineapple shaped LED lights,In a gold hue,3x AA batteries required (not included)'
#Output type 3
''
So essentially my productInfo column contains the above 3 kinds of contents. What i want is to get the Material for groupby analysis: extracted from the productInfo column of the dataframe, of course only when these values exist, if they don't, just set these values as null/None or whatever
I have tried boolean masks but can't seem to make them work, anyone with any suggestion is highly appreciated.
Thanks in advance
Edit: this was my original df: original df
My df after extracting Material from ProductInfo: df after extracting Material from ProductInfo
My df after extracting Material and Dimensions from ProductInfo: enter image description here
Hopefully, you guys get an idea of what I'm trying to achieve. Most of my task is to do text extraction from complex text blobs inside each column. If I find the relevant columns from the text clumps using regex then I update the columns else make them null. It has proven to be a big challenge, please if any of you guys can help me extract the useful info like Material and Dimensions from the productInfo text clump to their own columns, that'd be very helpful of you guys.
Thanks in Advance for anyone who tries to help me and sorry for my vague question without providing relevant information.
Happy Panda-ing(If that's a word!!) :)