Unnest (explode) a Pandas Series

Question

I have:

df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [1, 2, 3]})

   col1  col2
0  asdf     1
1    xy     2
2     q     3

I'd like to take the "combinatoric product" of each letter from the strings in col1, with each elementwise int in col2. I.e.:

  col1  col2
0    a    1
1    s    1
2    d    1
3    f    1
4    x    2
5    y    2
6    q    3

Current method:

from itertools import product

pieces = []
for _, s in df.iterrows():
    letters = list(s.col1)
    prods = list(product(letters, [s.col2]))
    pieces.append(pd.DataFrame(prods))

pd.concat(pieces)

Any more efficient workarounds?

For any interested parties out there, I've timed everyone's answer [here](https://stackoverflow.com/a/48197395/4909087). — cs95, Jan 10 '18 at 23:00
related https://stackoverflow.com/questions/53218931/how-do-i-unnest-explode-a-column-in-a-pandas-dataframe — BENY, Dec 11 '18 at 15:05

cs95 · Accepted Answer · 2018-01-11T07:09:35.243

Using list + str.join and np.repeat -

pd.DataFrame(
{
     'col1' : list(''.join(df.col1)), 
     'col2' : df.col2.values.repeat(df.col1.str.len(), axis=0)
})

  col1  col2
0    a     1
1    s     1
2    d     1
3    f     1
4    x     2
5    y     2
6    q     3

A generalised solution for any number of columns is easily achievable, without much change to the solution -

i = list(''.join(df.col1))
j = df.drop('col1', 1).values.repeat(df.col1.str.len(), axis=0)

df = pd.DataFrame(j, columns=df.columns.difference(['col1']))
df.insert(0, 'col1', i)

df

  col1 col2
0    a    1
1    s    1
2    d    1
3    f    1
4    x    2
5    y    2
6    q    3

Performance

df = pd.concat([df] * 100000, ignore_index=True)

# MaxU's solution

%%timeit
df.col1.str.extractall(r'(.)') \
           .reset_index(level=1, drop=True) \
           .join(df['col2']) \
           .reset_index(drop=True)

1 loop, best of 3: 1.98 s per loop

# piRSquared's solution

%%timeit
pd.DataFrame(
     [[x] + b for a, *b in df.values for x in a],
     columns=df.columns
)

1 loop, best of 3: 1.68 s per loop

# Wen's solution

%%timeit
v = df.col1.apply(list)
pd.DataFrame({'col1':np.concatenate(v.values),'col2':df.col2.repeat(v.apply(len))})

1 loop, best of 3: 835 ms per loop

# Alexander's solution

%%timeit
pd.DataFrame([(letter, i) 
              for letters, i in zip(df['col1'], df['col2']) 
              for letter in letters],
             columns=df.columns)

1 loop, best of 3: 316 ms per loop

%%timeit
pd.DataFrame(
{
     'col1' : list(''.join(df.col1)), 
     'col2' : df.col2.values.repeat(df.col1.str.len(), axis=0)
})

10 loops, best of 3: 124 ms per loop

I tried timing Vaishali's, but it took too long on this dataset.

All solutions is nice here, I will mark it down for dup :-),As we discussed before , pandas should adding a small api for unnesting :-) — BENY, Jan 10 '18 at 23:14

score 8 · Answer 2 · answered Jan 10 '18 at 22:43

8

pd.DataFrame([(letter, i) 
              for letters, i in zip(df['col1'], df['col2']) 
              for letter in letters],
             columns=df.columns)

answered Jan 10 '18 at 22:43

Alexander

105,104
32
201
196

score 8 · Answer 3 · answered Jan 10 '18 at 22:43

8

Trick from the list :-)

df.col1=df.col1.apply(list)
df
Out[489]: 
           col1  col2
0  [a, s, d, f]     1
1        [x, y]     2
2           [q]     3
pd.DataFrame({'col1':np.concatenate(df.col1.values),'col2':df.col2.repeat(df.col1.apply(len))})
Out[490]: 
  col1  col2
0    a     1
0    s     1
0    d     1
0    f     1
1    x     2
1    y     2
2    q     3

answered Jan 10 '18 at 22:43

BENY

317,841
20
164
234

I have been searching for more than 15 mins for this line.. '''df.col1=df.col1.apply(list)''' – Modem Rakesh goud Jun 23 '20 at 14:32
@ModemRakeshgoud yw : -) and FYI https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939 – BENY Jun 23 '20 at 14:33

score 7 · Answer 4 · answered Jan 10 '18 at 22:43

7

In [86]: df.col1.str.extractall(r'(.)') \
           .reset_index(level=1, drop=True) \
           .join(df['col2']) \
           .reset_index(drop=True)
Out[86]:
   0  col2
0  a     1
1  s     1
2  d     1
3  f     1
4  x     2
5  y     2
6  q     3

answered Jan 10 '18 at 22:43

MaxU - stand with Ukraine

205,989
36
386
419

score 7 · Answer 5 · answered Jan 10 '18 at 22:45

One more:)

df.set_index('col2').col1.apply(lambda x: pd.Series(list(x))).stack()\
.reset_index(1,drop = True).reset_index(name = 'col1')

    col2    col1
0   1       a
1   1       s
2   1       d
3   1       f
4   2       x
5   2       y
6   3       q

score 4 · Answer 6 · answered Jan 11 '18 at 07:01

4

General solution with a list comprehension and clever unpacking:

pd.DataFrame(
    [[x] + b for a, *b in df.values for x in a],
    columns=df.columns
)

  col1  col2
0    a     1
1    s     1
2    d     1
3    f     1
4    x     2
5    y     2
6    q     3

answered Jan 11 '18 at 07:01

piRSquared

285,575
57
475
624

This is great. Currently, it takes `1 loop, best of 3: 1.68 s per loop`. I'll add timings for all your options once they're done. – cs95 Jan 11 '18 at 07:03
I'll stick to this one – piRSquared Jan 11 '18 at 07:04
Ah, in that case, added timing result to my answer! – cs95 Jan 11 '18 at 07:09

score 2 · Answer 7 · answered Feb 21 '20 at 08:45

2

Using Explode (pandas>=0.25)

df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [1, 2, 3]})

df.col1=df.col1.apply(list)
df = df.explode('col1')

Result:

answered Feb 21 '20 at 08:45

Pygirl

12,969
5
30
43

score 0 · Answer 8 · answered Jul 10 '19 at 23:17

You can also try to itertools.chain and itertools.repeat functions to achieve similar results.

An example would be

import pandas as pd
from itertools import chain, repeat

d = {'col1': ['asdf', 'xy', 'q'], 'col2': [1, 2, 3]}

expanded_d = {
    "col1": list(chain(*[list(item) for item in d["col1"]])),
    "col2": list(chain(*[list(repeat(d["col2"][idx], len(list(d["col1"][idx])))) for idx in range(len(d["col1"])) ]))
    }

result = pd.DataFrame(data=expanded_d)

  col1  col2
0    a     1
1    s     1
2    d     1
3    f     1
4    x     2
5    y     2
6    q     3

Hope it helps.

Unnest (explode) a Pandas Series

8 Answers8

Linked

Related