1

This is related to: exploding a pandas dataframe column

Here's my dataframe:

import pandas as pd
import numpy as np
d = {'id': [1, 1, 1, 2, 2, 2], 'data': [{'foo':True}, {'foo':False, 'bar':True}, {'foo':True, 'bar':False, 'baz':True}, {'foo':False}, {'foo':False, 'bar':False}, {'foo':False, 'bar':True, 'baz':False}]}
df = pd.DataFrame(data=d)
df

I'd like to create a new column for each value in column data with the relevant True and False values. (and np.nan for any null values).

My new dataframe would look like:

a = {'id': [1, 1, 1, 2, 2, 2], 'data': [{'foo':True}, {'foo':False, 'bar':True}, {'foo':True, 'bar':False, 'baz':True}, {'foo':False}, {'foo':False, 'bar':False}, {'foo':False, 'bar':True, 'baz':False}], 'foo':[True, False, True, False, False, False], 'bar':[np.nan, True, False, np.nan, False, True], 'baz':[np.nan, np.nan, True, np.nan, np.nan, False] }
df1 = pd.DataFrame(data=a)
df1

I'm not sure if this can be achieved with Series.str.get_dummies as I'm not sure how to map the True and False values. Appreciate any help!

cs95
  • 379,657
  • 97
  • 704
  • 746
Kvothe
  • 1,341
  • 7
  • 20
  • 33

2 Answers2

1

Listify the column to get a list of records, then convert it to a DataFrame:

# pd.concat([df, pd.DataFrame(df['data'].tolist())], axis=1)
df.join(pd.DataFrame(df['data'].tolist()))

   id                                       data    bar    baz    foo
0   1                              {'foo': True}    NaN    NaN   True
1   1                {'foo': False, 'bar': True}   True    NaN  False
2   1   {'foo': True, 'bar': False, 'baz': True}  False   True   True
3   2                             {'foo': False}    NaN    NaN  False
4   2               {'foo': False, 'bar': False}  False    NaN  False
5   2  {'foo': False, 'bar': True, 'baz': False}   True  False  False

If the "data" column is not desired in the output, you can pop it before expanding:

df.join(pd.DataFrame(df.pop('data').tolist()))

   id    bar    baz    foo
0   1    NaN    NaN   True
1   1   True    NaN  False
2   1  False   True   True
3   2    NaN    NaN  False
4   2  False    NaN  False
5   2   True  False  False

Reference: Convert a list of dictionaries to pandas DataFrame

cs95
  • 379,657
  • 97
  • 704
  • 746
1

I am using from_records

pd.DataFrame.from_records(d['data'],index=d['id'])
     bar    baz    foo
1    NaN    NaN   True
1   True    NaN  False
1  False   True   True
2    NaN    NaN  False
2  False    NaN  False
2   True  False  False
BENY
  • 317,841
  • 20
  • 164
  • 234