How to expand an iterable stored as a single value in Python Pandas? (a.k.a. reverse groupby)

Question

I have the following DataFrame:

Index Letter Numbers
1     A      [1, 11]
2     B      [2, 22]
3     C      [3, 33]

And I want to "expand" the list of numbers just so each number has it's own row, like this:

Index Letter Numbers
1     A      1
2     A      11
3     B      2
4     B      22
5     C      3
6     C      33

I tried to achieve this by using the following code:

import pandas as pd

df = pd.DataFrame({
    'Letter': ['A', 'B', 'C'],
    'Numbers': [[1, 11], [2, 22], [3, 33]]
})

for i, row in df.iterrows():
    for num in row['Numbers']:
        new_row = row.copy()
        new_row['Numbers'] = num
        df = df.append(new_row, ignore_index=True)

df = df.loc[df.apply(lambda x: type(x['Numbers']) != list, axis=1)]

It works for the example, but in a larger DataFrame (a few hundred thousand lines for example) it takes a lot of time. Is there a better, more optimized way to do it? I tried using the apply method, but it clears my DataFrame for some reason...

`explode` is the most user friendly option. But it can be slow because operations like this don't scale particularly well. If performance is a real requirement, there are ways to use `numpy` to reconstruct the DataFrame. If you know all lists are the same size it can be very simple, otherwise you need a lot of care and it can get a bit messier — ALollz, Aug 23 '21 at 16:44

Karn Kumar · Answer 1 · 2021-08-23T17:40:15.560

As already mentioned in the comments, if you don't have huge data to parse, it's really been easy to use df.explode()

>>> df
  Letter  Numbers
0      A  [1, 11]
1      B  [2, 22]
2      C  [3, 33]

Solution 1

>>> df = df.explode('Numbers')
>>> df
  Letter Numbers
0      A       1
0      A      11
1      B       2
1      B      22
2      C       3
2      C      33

Solution 2:

This is faster with np.repeat ...

>>> lens = [len(item) for item in df['Numbers']]
>>> pd.DataFrame( {"Letter" : np.repeat(df['Letter'].values,lens), "Numbers" : np.hstack(df['Numbers'])})
  Letter  Numbers
0      A        1
1      A       11
2      B        2
3      B       22
4      C        3
5      C       33

Note:

Meanwhile, i see already an interesting answer for the same here another Solutions

Happy Coding!

Corralien · Answer 2 · 2021-08-23T17:04:27.800

Use explode to expand values vertically:

>>> df.explode('Numbers', ignore_index=True)

  Letter Numbers
0      A       1
1      A      11
2      B       2
3      B      22
4      C       3
5      C      33

Use apply(pd.Series) to expand values horizontally:


>>> df.join(df['Numbers'].apply(pd.Series)
                         .add_prefix('Numbers_')) \
      .drop(columns='Numbers')

  Letter  Numbers_0  Numbers_1
0      A          1         11
1      B          2         22
2      C          3         33

How to expand an iterable stored as a single value in Python Pandas? (a.k.a. reverse groupby)

2 Answers2

Solution 1

Solution 2:

Note: