How to explode each row in a Pandas Dataframe into multiple rows

Question

I have a function that is getting passed a pandas dataframe, and for each row in that dataframe I'd like to create N other rows, each of which are equivalent to the original row except for 2 column values.

What's the right way to do this - especially in a RAM-effective manner?

My attempts so far have been to run pd.apply, and then for each of those rows call a function that returns a list of pd.Series objects that I would then call append on to add them to the original DataFrame. This hasn't worked out, though.

Here is an example I tried with some dummy code to replicate:

students = [ ('Jack', 34, 'Sydney' , 'Australia') ,
             ('Jill', 30, 'New York' , 'USA' ) ]

# Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f'])

# function I will use to explode a single row into 10 new rows
def replicate(x):
    new_rows = []
    i = 0
    for j in range(3):
        y = x.copy(deep=True)
        y.Age = i
        i += 1
        new_rows.append(y)
    return new_rows

# Iterate over each row and append the results
df.apply(lambda x: df.append(replicate(x))

For the above, I'd expect output like the following:

Jack, 34, Sydney, Australia
Jack, 0, Sydney, Australia
Jack, 1, Sydney, Australia
Jack, 2, Sydney, Australia
Jill, 30, New York, USA
Jill, 0, New York, USA
Jill, 1, New York, USA
Jill, 2, New York, USA

In the end, I'd like my dataframe to have N times as many rows, where I can compute derived rows from the original rows. I'd like to do this in a space effective manner, and this isn't happening right now. Any help is appreciated!

Could you please point out what is your expected result for this input? — Daniel Labbe, Feb 18 '19 at 20:13
https://stackoverflow.com/questions/36875648/copy-pandas-dataframe-row-to-multiple-other-rows Try checking this out. There is many others similar to this question. — johnnyb, Feb 18 '19 at 20:21
@johnnyb That question seems to be overwriting values from one df into another, which I think is different than extending the length of the current df through derived rows. Although, maybe I'm missing something important that you're pointing out :) — James, Feb 18 '19 at 20:27

ALollz · Answer 1 · 2019-02-18T20:35:23.863

IIUC, you want np.repeat, using the Age column to specify the number of repeats, then fix the age column after the fact.

import pandas as pd

df1 = pd.DataFrame(df.values.repeat(df.Age+1, axis=0), 
                   columns=['Name', 'Age', 'City', 'Country'])
df1['Age'] = (df1.groupby([*df1]).cumcount()-1).where(df1.duplicated(), df1['Age'])

Output `df1`:

    Name Age      City    Country
0   Jack  34    Sydney  Australia
1   Jack   0    Sydney  Australia
2   Jack   1    Sydney  Australia
3   Jack   2    Sydney  Australia
4   Jack   3    Sydney  Australia
...
34  Jack  33    Sydney  Australia
35  Jill  30  New York        USA
...
63  Jill  27  New York        USA
64  Jill  28  New York        USA
65  Jill  29  New York        USA

[66 rows x 4 columns]

Input `df`:

   Name  Age      City    Country
a  Jack   34    Sydney  Australia
b  Jill   30  New York        USA

score 1 · Answer 2 · answered Feb 18 '19 at 20:47

IIUC

d={x : y.set_index('Age').reindex(range(y['Age'].iloc[0]+1),method='bfill') for x , y in df.groupby(level=0)}
newdf=pd.concat(d).reset_index(level=1)
newdf
Out[220]: 
    Age  Name      City    Country
a     0  Jack    Sydney  Australia
a     1  Jack    Sydney  Australia
a     2  Jack    Sydney  Australia
a     3  Jack    Sydney  Australia
a     4  Jack    Sydney  Australia
a     5  Jack    Sydney  Australia
a     6  Jack    Sydney  Australia
a     7  Jack    Sydney  Australia
a     8  Jack    Sydney  Australia
a     9  Jack    Sydney  Australia
a    10  Jack    Sydney  Australia
a    11  Jack    Sydney  Australia
a    12  Jack    Sydney  Australia
a    13  Jack    Sydney  Australia
a    14  Jack    Sydney  Australia
a    15  Jack    Sydney  Australia
a    16  Jack    Sydney  Australia
a    17  Jack    Sydney  Australia
a    18  Jack    Sydney  Australia
a    19  Jack    Sydney  Australia
a    20  Jack    Sydney  Australia
a    21  Jack    Sydney  Australia
a    22  Jack    Sydney  Australia
a    23  Jack    Sydney  Australia
a    24  Jack    Sydney  Australia
a    25  Jack    Sydney  Australia
a    26  Jack    Sydney  Australia
a    27  Jack    Sydney  Australia
a    28  Jack    Sydney  Australia
a    29  Jack    Sydney  Australia
..  ...   ...       ...        ...
b     1  Jill  New York        USA
b     2  Jill  New York        USA
b     3  Jill  New York        USA
b     4  Jill  New York        USA
b     5  Jill  New York        USA
b     6  Jill  New York        USA
b     7  Jill  New York        USA
b     8  Jill  New York        USA
b     9  Jill  New York        USA
b    10  Jill  New York        USA
b    11  Jill  New York        USA
b    12  Jill  New York        USA
b    13  Jill  New York        USA
b    14  Jill  New York        USA
b    15  Jill  New York        USA
b    16  Jill  New York        USA
b    17  Jill  New York        USA
b    18  Jill  New York        USA
b    19  Jill  New York        USA
b    20  Jill  New York        USA
b    21  Jill  New York        USA
b    22  Jill  New York        USA
b    23  Jill  New York        USA
b    24  Jill  New York        USA
b    25  Jill  New York        USA
b    26  Jill  New York        USA
b    27  Jill  New York        USA
b    28  Jill  New York        USA
b    29  Jill  New York        USA
b    30  Jill  New York        USA
[66 rows x 4 columns]

score 0 · Answer 3 · answered Feb 18 '19 at 20:25

You can put you dataframe inside a list and then do whatever you want:

# x5 row duplicate

df = df.append([df]*5, ignore_index=True)
df.sort_values(by='Name').head(15)


# Result

Name    Age     City    Country
28  John    16  New York    US
4   John    16  New York    US
22  John    16  New York    US
34  John    16  New York    US
16  John    16  New York    US
10  John    16  New York    US
17  Mike    17  las vegas   US
29  Mike    17  las vegas   US
23  Mike    17  las vegas   US
11  Mike    17  las vegas   US
35  Mike    17  las vegas   US
5   Mike    17  las vegas   US
3   Neelu   32  Bangalore   India
33  Neelu   32  Bangalore   India
15  Neelu   32  Bangalore   India

How to explode each row in a Pandas Dataframe into multiple rows

3 Answers3

Output df1:

Input df:

Output `df1`:

Input `df`: