Splitting a list in a Pandas cell into multiple columns

Question

I have a really simple Pandas dataframe where each cell contains a list. I'd like to split each element of the list into it's own column. I can do that by exporting the values and then creating a new dataframe. This doesn't seem like a good way to do this especially, if my dataframe had a column aside from the list column.

import pandas as pd

df = pd.DataFrame(data=[[[8,10,12]],
                        [[7,9,11]]])

df = pd.DataFrame(data=[x[0] for x in df.values])

Desired output:

   0   1   2
0  8  10  12
1  7   9  11

Follow-up based on @Psidom answer:

If I did have a second column:

df = pd.DataFrame(data=[[[8,10,12], 'A'],
                        [[7,9,11], 'B']])

How do I not loose the other column?

Desired output:

   0   1   2  3 
0  8  10  12  A
1  7   9  11  B

@Psidom `apply(Series)` would work but [perhaps we could do better](https://stackoverflow.com/q/54432583/4909087). — cs95, Feb 03 '19 at 07:06

Psidom · Accepted Answer · 2016-12-02T03:57:22.863

26

You can loop through the Series with apply() function and convert each list to a Series, this automatically expand the list as a series in the column direction:

df[0].apply(pd.Series)

#   0    1   2
#0  8   10  12
#1  7    9  11

Update: To keep other columns of the data frame, you can concatenate the result with the columns you want to keep:

pd.concat([df[0].apply(pd.Series), df[1]], axis = 1)

#   0    1   2  1
#0  8   10  12  A
#1  7    9  11  B

edited Dec 02 '16 at 03:57

answered Dec 02 '16 at 03:42

Psidom

209,562
33
339
356

How can I not loose an additional column (modified original question)? – user2242044 Dec 02 '16 at 03:53
You need the `pd.concat()` method. See the update! – Psidom Dec 02 '16 at 03:57

Zero · Answer 2 · 2017-10-12T08:15:52.310

You could do pd.DataFrame(df[col].values.tolist()) - is much faster ~500x

In [820]: pd.DataFrame(df[0].values.tolist())
Out[820]:
   0   1   2
0  8  10  12
1  7   9  11

In [821]: pd.concat([pd.DataFrame(df[0].values.tolist()), df[1]], axis=1)
Out[821]:
   0   1   2  1
0  8  10  12  A
1  7   9  11  B

Timings

Medium

In [828]: df.shape
Out[828]: (20000, 2)

In [829]: %timeit pd.DataFrame(df[0].values.tolist())
100 loops, best of 3: 15 ms per loop

In [830]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 4.06 s per loop

Large

In [832]: df.shape
Out[832]: (200000, 2)

In [833]: %timeit pd.DataFrame(df[0].values.tolist())
10 loops, best of 3: 161 ms per loop

In [834]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 40.9 s per loop

This should be the accepted answer instead. – Jinhua Wang Aug 12 '20 at 17:53 — Jinhua Wang, Aug 12 '20 at 17:53

Splitting a list in a Pandas cell into multiple columns

2 Answers2

Linked

Related