1

It's solution is definitely out there but I couldn't find it. So posting it here.
I have a dataframe which is like

  object_Id object_detail
0     obj00       red mug
1     obj01      red bowl
2     obj02     green mug
3     obj03     white candle holder

I want to split the column object_details into two columns: name, object_color based on a list that contains the color name

COLOR = ['red', 'green', 'blue', 'white']
print(df)
# want to perform some operation so that It'll get output 
  object_Id object_detail           object_color          name
0     obj00       red mug              red                 mug
1     obj01      red bowl              red                bowl
2     obj02     green mug              green               mug
3     obj03     white candle holder    white     candle holder

This is my first time using dataframe so I am not sure how to achieve it using pandas. I can achieve it by converting it into a list and then apply a filter. But I think there are easier ways out there that I might miss. Thanks

LiMuBei
  • 2,868
  • 22
  • 27
Pranjal Doshi
  • 862
  • 11
  • 29
  • You need spit by spaces? If need something more complicated like `df[['object_color','name']] = df['object_detail'].str.split(n=1, expand=True)` can you explain more? – jezrael Sep 30 '21 at 06:43
  • Not necessarily. Object name can contain space `candle holder`. What I want to do is filter it from list which contains color name. Updated the example – Pranjal Doshi Sep 30 '21 at 06:47
  • Yop, then my solution working nice, because split by first space. – jezrael Sep 30 '21 at 06:48
  • yes, it will work in usualy case but will raise an issue when color names are `Aero blue` or `Atomic tangerine` from the list. reference color list I'm using https://en.wikipedia.org/wiki/List_of_colors:_A%E2%80%93F – Pranjal Doshi Sep 30 '21 at 06:52
  • colors are always first in each strings? Or possible `mug red` ? There is always one color for each value, so not possible `red mug green` ? – jezrael Sep 30 '21 at 06:56
  • It can be `Barbie Pink frock` where `Barbie pink` is color. `frock` is object. Yes color is always first followed by name – Pranjal Doshi Sep 30 '21 at 06:59

1 Answers1

2

Use Series.str.extract with joined values of list by | for regex OR and then all another values in new column splitted by space:

pat = "|".join(COLOR)
df[['object_color','name']] = df['object_detail'].str.extract(f'({pat})\s+(.*)',expand=True)
print (df)
  object_Id        object_detail object_color           name
0     obj00    Barbie Pink frock  Barbie Pink          frock
1     obj01             red bowl          red           bowl
2     obj02            green mug        green            mug
3     obj03  white candle holder        white  candle holder
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252