Pandas: efficiently inserting a large number of rows

Question

I have a large dataframe in this format, call this df:

index	val1	val2
0	0.2	0.1
1	0.5	0.7
2	0.3	0.4

I have a row I will be inserting, call this myrow:

index	val1	val2
-1	0.9	0.9

I wish to insert this row 3 times after every row in the original dataframe, i.e.:

index	val1	val2
0	0.2	0.1
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
1	0.5	0.7
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
2	0.3	0.4
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9

This is straightforward with a bit of looping. TLDR: how do I do this more efficiently?

Let's make a repeat rows function, and create our set of 3 repeats:

import pandas as pd
import numpy as np

def repeat_rows(df, n):
    newdf = pd.DataFrame(np.repeat(df.values, n, axis=0))
    newdf.columns = df.columns
    return newdf

repeats = repeat_rows(myrow.to_frame().T, 3)

Now we have our 3 repeats:

index	val1	val2
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9

Finally, we can loop over the original df's rows, and concat repeats to the row, and concat the result of all of those together:

blocks = []
for _, row in df.iterrows():
    blocks.append(pd.concat([row.to_frame().T, repeats]))
result = pd.concat(blocks)

We now have the desired result!

The problem is, this is very slow, and I'm looking for a faster solution.

I'm guessing a better solution would follow this pattern:

result = repeat_rows(df, 4)
result.loc[LAST_3_ROWS_IN_EACH_BLOCK_OF_4] = myrow

However, I'm not sure how to do such a loc assignment. How can I make my solution more efficient?

`'index'` is your actual DataFrame Index, or a column labeled 'index'? — ALollz, Jul 30 '21 at 14:49
For what it's worth, using `df.loc`, `pd.concat` or `df.append` in a loop is very slow. You should try gathering all your data first before transforming it into a DataFrame. Take a look at [this answer](https://stackoverflow.com/a/62734983/11161432) for more discussion and examples. — jfaccioni, Jul 30 '21 at 15:04

ALollz · Accepted Answer · 2021-07-30T15:04:02.253

reset_index so that df has a simple RangeIndex. Then we can do math with tiling and repeats to create an Index that when sorted will place 3 of the myrow rows between each row of your DataFrame. Finally remove this Index and get back to a normal RangeIndex.

Sample Data

import pandas as pd
import numpy as np

myrow = pd.DataFrame({'index': [-1], 'val1': [0.9], 'val2': [0.9]})
df = pd.DataFrame({'index': [0,1,2],
                   'val1': [0.2, 0.5, 0.3],
                   'val2': [0.1, 0.7, 0.4]})

Code

# Ensure starting from a RangeIndex
df = df.reset_index(drop=True)

NR = 3  # Number of repeats

mr = pd.concat([myrow]*len(df)*NR, ignore_index=True)
mr.index = df.index.repeat(NR) + np.tile(np.arange(0, 1, 1/NR), len(df))

# `mr` second in the `concat` so rows go below
df = pd.concat([df, mr]).sort_index().reset_index(drop=True)

    index  val1  val2
0       0   0.2   0.1
1      -1   0.9   0.9
2      -1   0.9   0.9
3      -1   0.9   0.9
4       1   0.5   0.7
5      -1   0.9   0.9
6      -1   0.9   0.9
7      -1   0.9   0.9
8       2   0.3   0.4
9      -1   0.9   0.9
10     -1   0.9   0.9
11     -1   0.9   0.9

score 0 · Answer 2 · answered Jul 30 '21 at 14:53

This looks to me like you might not be solving the right problem. It might help if you tell us what you're actually trying to achieve by doing this!

Nevertheless - A loop is always a bad idea!

There are some suggestions here: Pandas - Interleave / Zip two DataFrames by row

The basic idea would be to create a table of extra rows, which you concat with your original table:

extra = pd.DataFrame(ROW.values.repeat(df.size), columns=ROW.columns)
df = pd.concat(df, extra)

Finally sort to put everything in the right order. For the sort to work you'll need to add an index column to each table - but that shouldn't be /too/ tricky.

index	val1	val2
0	0.2	0.1
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
1	0.5	0.7
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
2	0.3	0.4
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9

index	val1	val2
0	0.2	0.1
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
1	0.5	0.7
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
2	0.3	0.4
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9

Pandas: efficiently inserting a large number of rows

2 Answers2

Sample Data

Code

index	val1	val2
0	0.2	0.1
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
1	0.5	0.7
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9
2	0.3	0.4
-1	0.9	0.9
-1	0.9	0.9
-1	0.9	0.9