Convert a column with a list of dicts, where column name and value are both present as values inside dict keys

Question

This question is different from the other ones because in none of them the column name resides in the value of a key... Please look at the examples given before marking as duplicate.

I have a df like so:

df: col1 col2 col3
    100  200  [{'attribute': 'Pattern', 'value': 'Printed'},...

Closer look at column 3 looks like:

[{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'},
 {'attribute': 'Bottomwear style', 'value': 'Baggy'},
 {'attribute': 'Topwear length', 'value': 'Waist'},
 {'attribute': 'Sleeve style', 'value': 'Sleeveless'},
 {'attribute': 'Type of pattern', 'value': 'Graphic print'},
 {'attribute': 'Neck', 'value': 'Round'},
 {'attribute': 'Level of embellishment', 'value': 'No'}]

Where each attribute is column name and each value, is the value for that column name.

The output will look something like this:

df: col1   col2    Pattern       Topwear Style       Bottomwear Length ....
    100    200     Printed       T shirt             Shorts

There are multiple rows with repeating and new attributes and values. How would I go about doing this in pandas? I tried searching for something similar but couldn't find anything useful.

What you have tried so far? – shaik moeed Sep 05 '19 at 07:17 — shaik moeed, Sep 05 '19 at 07:17
Nope, that's why I had to post it again – piyush daga Sep 05 '19 at 07:21 — piyush daga, Sep 05 '19 at 07:21

ComplicatedPhenomenon · Answer 1 · 2019-09-05T07:37:44.567

3

x = df['col3'].tolist()
newcol = {item['attribute'] : [item['value']] for item in x }
newdf = pd.DataFrame(newcol)
del df['col3'] 
print(df.join(newdf, how='right'))

Output

   col1  col2  Pattern Topwear style Bottomwear Length  Colour Palette  \
0   100   200  Printed       T shirt             Short  Bright colours  
...

dataframe for test.

data = {'col1':100, 'col2': 200, 'col3': [{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'},
 {'attribute': 'Bottomwear style', 'value': 'Baggy'},
 {'attribute': 'Topwear length', 'value': 'Waist'},
 {'attribute': 'Sleeve style', 'value': 'Sleeveless'},
 {'attribute': 'Type of pattern', 'value': 'Graphic print'},
 {'attribute': 'Neck', 'value': 'Round'},
 {'attribute': 'Level of embellishment', 'value': 'No'}]}

df = pd.DataFrame(data)

edited Sep 05 '19 at 07:37

answered Sep 05 '19 at 07:29

ComplicatedPhenomenon

4,055
2
18
45

1

@jezrael sure, check the updation. – ComplicatedPhenomenon Sep 05 '19 at 07:39
ya, problem of answer is different format like OP, so failed for OP data. – jezrael Sep 05 '19 at 07:42
Your solution is definitely easier to understand for a pandas newbie like me. – piyush daga Sep 05 '19 at 07:43
1

Nope, easy as well, thanks for your answer, upvoted it. – piyush daga Sep 05 '19 at 07:53

score 3 · Accepted Answer · answered Sep 05 '19 at 07:36

Try with:

df=df.join(pd.concat([pd.DataFrame(v).set_index('attribute').T 
               for v in df.pop('col3')]).reset_index(drop=True))

Setup:

d=[{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'},
 {'attribute': 'Bottomwear style', 'value': 'Baggy'},
 {'attribute': 'Topwear length', 'value': 'Waist'},
 {'attribute': 'Sleeve style', 'value': 'Sleeveless'},
 {'attribute': 'Type of pattern', 'value': 'Graphic print'},
 {'attribute': 'Neck', 'value': 'Round'},
 {'attribute': 'Level of embellishment', 'value': 'No'}]
df=pd.DataFrame({'a':100,'b':200,'col3':[d]},index=[0])

Output:

jezrael · Answer 3 · 2019-09-05T07:55:05.913

You can use nested list comprehension with dict comprehension for list of dictionaries possible pass to DataFrame constructor:

Advatage is better performance, disadvatage a bit complicated.

d = [{'attribute': 'Pattern', 'value': 'Printed'},
 {'attribute': 'Topwear style', 'value': 'T shirt'},
 {'attribute': 'Bottomwear Length', 'value': 'Short'},
 {'attribute': 'Colour Palette', 'value': 'Bright colours'}
]

df = pd.DataFrame({'col1':[100, 20], 'col2':[200, 10], 'col3':[d, d]})
print (df)

   col1  col2                                               col3
0   100   200  [{'attribute': 'Pattern', 'value': 'Printed'},...
1    20    10  [{'attribute': 'Pattern', 'value': 'Printed'},...

a = [{y['attribute']: y['value']  for y in x for k, v in y.items()} for x in df.pop('col3')]

df = df.join(pd.DataFrame(a))
print (df)
   col1  col2  Pattern Topwear style Bottomwear Length  Colour Palette
0   100   200  Printed       T shirt             Short  Bright colours
1    20    10  Printed       T shirt             Short  Bright colours

@piyushdaga - because it is flattenting, similar idea like for list from [this](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists) — jezrael, Sep 05 '19 at 08:06
@piyushdaga - it is not necesarry here, because is used `y['attribute']: y['value']` — jezrael, Sep 05 '19 at 08:09

Convert a column with a list of dicts, where column name and value are both present as values inside dict keys

3 Answers3