How to duplicate Python dataframe one by one?

Question

I have a pandas.DataFrame as follows:

I'd like to make this three times to become:

df2 is made from a loop, but it is not efficient.

How can I get df2 from df1 using a matrix way which is faster?

*"one by one"* doesn't say whether you mean by row or by column. You want to duplicate each **row** n times. — smci, Dec 08 '19 at 01:08

score 5 · Answer 1 · answered May 07 '17 at 05:35

5

Build a one dimensional indexer to slice both the the values array and index. You must take care of the index as well to get your desired results.

use np.repeat on an np.arange to get the indexer
construct a new dataframe using this indexer on both values and the index

r = np.arange(len(df)).repeat(3)
pd.DataFrame(df.values[r], df.index[r], df.columns)

   a  b
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

answered May 07 '17 at 05:35

piRSquared

285,575
57
475
624

1

You mean when the wives of the big guns insist they do something other than answer questions on SO :-) – piRSquared May 07 '17 at 05:42
Yes exactly. Mine had to drive our daughter to her senior prom over an hour away, so i was free... – Stephen Rauch May 07 '17 at 05:43

Vaishali · Answer 2 · 2017-05-07T17:55:54.707

You can use np.repeat

df = pd.DataFrame(np.repeat(df.values,[3,3], axis = 0), columns = df.columns)

You get

Time testing:

%timeit pd.DataFrame(np.repeat(df.values,[3,3], axis = 0))
1000 loops, best of 3: 235 µs per loop

%timeit pd.concat([df] * 3).sort_index()
best of 3: 1.26 ms per loop

Numpy is definitely faster in most cases so no surprises there

EDIT: I am not sure if you would be looking for repeating indices but incase you do,

pd.DataFrame(np.repeat(df.values,3, axis = 0), index = np.repeat(df.index, 3), columns = df.columns)

You are missing the index being repeated as well – piRSquared May 07 '17 at 05:38 — piRSquared, May 07 '17 at 05:38

score 2 · Answer 3 · answered May 07 '17 at 01:31

2

I do not know if it is more efficient than your loop, but it easy enough to construct as:

Code:

pd.concat([df] * 3).sort_index()

Test Code:

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('ab'))
print(pd.concat([df] * 3).sort_index())

Results:

answered May 07 '17 at 01:31

Stephen Rauch

47,830
31
106
135

1

You nailed the index as well... even if it is slower. – piRSquared May 07 '17 at 05:37

jezrael · Answer 4 · 2017-05-07T05:43:41.417

2

You can use numpy.repeat with parameter scalar 3 and then add columns parameter to DataFrame constructor:

df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print (df)
   a  b
0  1  2
1  1  2
2  1  2
3  3  4
4  3  4
5  3  4

If really want duplicated index what can complicated some pandas functions like reindex which failed:

r = np.repeat(np.arange(len(df.index)), 3)
df = pd.DataFrame(df.values[r], df.index[r], df.columns)
print (df)
   a  b
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

edited May 07 '17 at 05:43

answered May 07 '17 at 05:31

jezrael

822,522
95
1,334
1,252

Plust one from me! – piRSquared May 07 '17 at 05:43

score 0 · Answer 5 · answered May 07 '17 at 07:12

Not the fastest (not the slowest either) but the shortest solution so far.

#Build a index array and extract the rows to build the desired new df. This handles index and data all at once.    
df.iloc[np.repeat(df.index,3)]

Out[270]: In [271]: 
   a  b
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

How to duplicate Python dataframe one by one?

5 Answers5

Linked