2

This question is different from the other ones because in none of them the column name resides in the value of a key... Please look at the examples given before marking as duplicate.

I have a df like so:

df: col1 col2 col3
    100  200  [{'attribute': 'Pattern', 'value': 'Printed'},...

Closer look at column 3 looks like:

[{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'},
 {'attribute': 'Bottomwear style', 'value': 'Baggy'},
 {'attribute': 'Topwear length', 'value': 'Waist'},
 {'attribute': 'Sleeve style', 'value': 'Sleeveless'},
 {'attribute': 'Type of pattern', 'value': 'Graphic print'},
 {'attribute': 'Neck', 'value': 'Round'},
 {'attribute': 'Level of embellishment', 'value': 'No'}]

Where each attribute is column name and each value, is the value for that column name.

The output will look something like this:

df: col1   col2    Pattern       Topwear Style       Bottomwear Length ....
    100    200     Printed       T shirt             Shorts

There are multiple rows with repeating and new attributes and values. How would I go about doing this in pandas? I tried searching for something similar but couldn't find anything useful.

piyush daga
  • 501
  • 4
  • 17

3 Answers3

3
x = df['col3'].tolist()
newcol = {item['attribute'] : [item['value']] for item in x }
newdf = pd.DataFrame(newcol)
del df['col3'] 
print(df.join(newdf, how='right'))

Output

   col1  col2  Pattern Topwear style Bottomwear Length  Colour Palette  \
0   100   200  Printed       T shirt             Short  Bright colours  
... 

dataframe for test.

data = {'col1':100, 'col2': 200, 'col3': [{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'},
 {'attribute': 'Bottomwear style', 'value': 'Baggy'},
 {'attribute': 'Topwear length', 'value': 'Waist'},
 {'attribute': 'Sleeve style', 'value': 'Sleeveless'},
 {'attribute': 'Type of pattern', 'value': 'Graphic print'},
 {'attribute': 'Neck', 'value': 'Round'},
 {'attribute': 'Level of embellishment', 'value': 'No'}]}

df = pd.DataFrame(data)
ComplicatedPhenomenon
  • 4,055
  • 2
  • 18
  • 45
3

Try with:

df=df.join(pd.concat([pd.DataFrame(v).set_index('attribute').T 
               for v in df.pop('col3')]).reset_index(drop=True))

Setup:

d=[{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'},
 {'attribute': 'Bottomwear style', 'value': 'Baggy'},
 {'attribute': 'Topwear length', 'value': 'Waist'},
 {'attribute': 'Sleeve style', 'value': 'Sleeveless'},
 {'attribute': 'Type of pattern', 'value': 'Graphic print'},
 {'attribute': 'Neck', 'value': 'Round'},
 {'attribute': 'Level of embellishment', 'value': 'No'}]
df=pd.DataFrame({'a':100,'b':200,'col3':[d]},index=[0])

Output:

enter image description here

anky
  • 74,114
  • 11
  • 41
  • 70
3

You can use nested list comprehension with dict comprehension for list of dictionaries possible pass to DataFrame constructor:

Advatage is better performance, disadvatage a bit complicated.

d = [{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'}
]

df = pd.DataFrame({'col1':[100, 20], 'col2':[200, 10], 'col3':[d, d]})
print (df)

   col1  col2                                               col3
0   100   200  [{'attribute': 'Pattern', 'value': 'Printed'},...
1    20    10  [{'attribute': 'Pattern', 'value': 'Printed'},...

a = [{y['attribute']: y['value']  for y in x for k, v in y.items()} for x in df.pop('col3')]

df = df.join(pd.DataFrame(a))
print (df)
   col1  col2  Pattern Topwear style Bottomwear Length  Colour Palette
0   100   200  Printed       T shirt             Short  Bright colours
1    20    10  Printed       T shirt             Short  Bright colours
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Why are you even doing the for k, v in y.tems() part ? – piyush daga Sep 05 '19 at 08:02
  • @piyushdaga - because it is flattenting, similar idea like for list from [this](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists) – jezrael Sep 05 '19 at 08:06
  • @piyushdaga - it is not necesarry here, because is used `y['attribute']: y['value']` – jezrael Sep 05 '19 at 08:09