I want to expand my dataframe with duplicate the row regularly.
import pandas as pd
import numpy as np
def expandData(data, timeStep=2, sampleLen= 5):
dataEp = pd.DataFrame()
for epoch in range(int(len(data)/sampleLen)):
dataSample = data.iloc[epoch*sampleLen:(epoch+1)*sampleLen, :]
for num in range(int(sampleLen-timeStep +1)):
tempDf = dataSample.iloc[num:timeStep+num,:]
dataEp = pd.concat([dataEp, tempDf],axis= 0)
return dataEp
df = pd.DataFrame({'a':list(np.arange(5))+list(np.arange(15,20)),
'other':list(np.arange(100,110))})
dfEp = expandData(df, 3, 5)
Output:
df
a other
0 0 100
1 1 101
2 2 102
3 3 103
4 4 104
5 15 105
6 16 106
7 17 107
8 18 108
9 19 109
dfEp
a other
0 0 100
1 1 101
2 2 102
1 1 101
2 2 102
3 3 103
2 2 102
3 3 103
4 4 104
5 15 105
6 16 106
7 17 107
6 16 106
7 17 107
8 18 108
7 17 107
8 18 108
9 19 109
Expected:
I expect a better a way of achieving it with good performance, as if the dataframe has large row size,such as 40 thousands rows, my code will run for about 20 minutes.
Edit:
Actually, I expect to repeat a small sequence with size of timeStep
. And I have changed expandData(df, 2, 5)
into expandData(df, 3, 5)
.