I have a large dataframe in this format, call this df
:
index | val1 | val2 |
---|---|---|
0 | 0.2 | 0.1 |
1 | 0.5 | 0.7 |
2 | 0.3 | 0.4 |
I have a row I will be inserting, call this myrow
:
index | val1 | val2 |
---|---|---|
-1 | 0.9 | 0.9 |
I wish to insert this row 3 times after every row in the original dataframe, i.e.:
index | val1 | val2 |
---|---|---|
0 | 0.2 | 0.1 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
1 | 0.5 | 0.7 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
2 | 0.3 | 0.4 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
This is straightforward with a bit of looping. TLDR: how do I do this more efficiently?
Let's make a repeat rows function, and create our set of 3 repeats:
import pandas as pd
import numpy as np
def repeat_rows(df, n):
newdf = pd.DataFrame(np.repeat(df.values, n, axis=0))
newdf.columns = df.columns
return newdf
repeats = repeat_rows(myrow.to_frame().T, 3)
Now we have our 3 repeats:
index | val1 | val2 |
---|---|---|
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
-1 | 0.9 | 0.9 |
Finally, we can loop over the original df
's rows, and concat repeats
to the row, and concat the result of all of those together:
blocks = []
for _, row in df.iterrows():
blocks.append(pd.concat([row.to_frame().T, repeats]))
result = pd.concat(blocks)
We now have the desired result!
The problem is, this is very slow, and I'm looking for a faster solution.
I'm guessing a better solution would follow this pattern:
result = repeat_rows(df, 4)
result.loc[LAST_3_ROWS_IN_EACH_BLOCK_OF_4] = myrow
However, I'm not sure how to do such a loc assignment. How can I make my solution more efficient?