2

I have the following DataFrame:

Index Letter Numbers
1     A      [1, 11]
2     B      [2, 22]
3     C      [3, 33]

And I want to "expand" the list of numbers just so each number has it's own row, like this:

Index Letter Numbers
1     A      1
2     A      11
3     B      2
4     B      22
5     C      3
6     C      33

I tried to achieve this by using the following code:

import pandas as pd

df = pd.DataFrame({
    'Letter': ['A', 'B', 'C'],
    'Numbers': [[1, 11], [2, 22], [3, 33]]
})

for i, row in df.iterrows():
    for num in row['Numbers']:
        new_row = row.copy()
        new_row['Numbers'] = num
        df = df.append(new_row, ignore_index=True)

df = df.loc[df.apply(lambda x: type(x['Numbers']) != list, axis=1)]

It works for the example, but in a larger DataFrame (a few hundred thousand lines for example) it takes a lot of time. Is there a better, more optimized way to do it? I tried using the apply method, but it clears my DataFrame for some reason...

csantos
  • 47
  • 5
  • 2
    Use `explode`: `df.explode('Numbers')` – Psidom Aug 23 '21 at 16:42
  • 1
    `explode` is the most user friendly option. But it can be slow because operations like this don't scale particularly well. If performance is a real requirement, there are ways to use `numpy` to reconstruct the DataFrame. If you know all lists are the same size it can be very simple, otherwise you need a lot of care and it can get a bit messier – ALollz Aug 23 '21 at 16:44

2 Answers2

1

As already mentioned in the comments, if you don't have huge data to parse, it's really been easy to use df.explode()

>>> df
  Letter  Numbers
0      A  [1, 11]
1      B  [2, 22]
2      C  [3, 33]

Solution 1

>>> df = df.explode('Numbers')
>>> df
  Letter Numbers
0      A       1
0      A      11
1      B       2
1      B      22
2      C       3
2      C      33

Solution 2:

This is faster with np.repeat ...

>>> lens = [len(item) for item in df['Numbers']]
>>> pd.DataFrame( {"Letter" : np.repeat(df['Letter'].values,lens), "Numbers" : np.hstack(df['Numbers'])})
  Letter  Numbers
0      A        1
1      A       11
2      B        2
3      B       22
4      C        3
5      C       33

Other Solutions ..

df.explode('Numbers', ignore_index=True)
# df.explode('Numbers').reset_index(drop=True)

  Letter Numbers
0      A       1
1      A      11
2      B       2
3      B      22
4      C       3
5      C      33

with apply + pd.Series

 >>> df.set_index('Letter').Numbers.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'Numbers'})
  Letter  Numbers
0      A        1
1      A       11
0      B        2
1      B       22
0      C        3
1      C       33

Note:

Meanwhile, i see already an interesting answer for the same here another Solutions

Happy Coding!

Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
1

Use explode to expand values vertically:

>>> df.explode('Numbers', ignore_index=True)

  Letter Numbers
0      A       1
1      A      11
2      B       2
3      B      22
4      C       3
5      C      33

Use apply(pd.Series) to expand values horizontally:


>>> df.join(df['Numbers'].apply(pd.Series)
                         .add_prefix('Numbers_')) \
      .drop(columns='Numbers')

  Letter  Numbers_0  Numbers_1
0      A          1         11
1      B          2         22
2      C          3         33
Corralien
  • 109,409
  • 8
  • 28
  • 52