3

I have the following data:

data = [{
  'color': ['red','green'],
  'name': 'obj1' 
}, {
  'color': ['blue','brown','pink'],
  'name': 'obj2'
}]

and when I use pandas, it gives me an output like this:

    color                 name
0   [red, green]          obj1
1   [blue, brown, pink]   obj2

but I need an output like this:

    color.0  color.1  color.2  name
0   red      green    NaN         obj1
1   blue     brown    pink     obj2

I have tried json_normalize but unable to get the desired output.

Thanks in advance.

Satnam Sandhu
  • 610
  • 1
  • 10
  • 25

2 Answers2

1

You can preprocessing list of dicts and then call DataFrame constructor:

out = []
for x in data:
    d = {}
    for k, v in x.items():
        if isinstance(v, list):
            for i, y in enumerate(v):
                d['{}.{}'.format(k,i)] = y
        else:
            d[k] = v
    out.append(d)
print (out)
[{'color.0': 'red', 'color.1': 'green', 'name': 'obj1'}, 
 {'color.0': 'blue', 'color.1': 'brown', 'color.2': 'pink', 'name': 'obj2'}]

df = pd.DataFrame(out).sort_index(axis=1)
print (df)
  color.0 color.1 color.2  name
0     red   green     NaN  obj1
1    blue   brown    pink  obj2

You can create DataFrame and then expand list column to multiple ones:

df = pd.DataFrame(data)
df1 = pd.DataFrame(df.pop('color').values.tolist(), index=df.index)

df = df.join(df1.add_prefix('color.')).sort_index(axis=1)
print (df)
  color.0 color.1 color.2  name
0     red   green    None  obj1
1    blue   brown    pink  obj2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

you can use a pd.Series:

df = pd.DataFrame(data)
df[['color.0',  'color.1', 'color.2']]=df.color.apply(pd.Series)
df.drop('color',1)

    name    color.0 color.1 color.2
0   obj1    red     green   NaN
1   obj2    blue    brown   pink
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • Never use [apply(pd.Series)](https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns/35491399#35491399), because slow... – jezrael Aug 17 '19 at 07:31
  • I trust your opinion but If data set is not large, I guess it's not going to be a big issue. – Allen Qin Aug 17 '19 at 07:53
  • Hmmm, in my opinion is always the best write the fastest code, so always avoid slowier alternatives (because obviously OP not write small data). – jezrael Aug 17 '19 at 07:55
  • and here is really huge difference in 7k rows, what are small data, if larger, it is worse. – jezrael Aug 17 '19 at 07:57
  • I agree the diff is huge but in absolute terms, it's still just one-second difference. – Allen Qin Aug 17 '19 at 08:22