replicate rows by n times in python

Question

I have a dataframe that the first column are strings, the 2nd column are the numbers that I want to replicate each string.

df = pd.DataFrame({'String':['a','b', 'c', 'd', 'e', 'f', 'g'],
        'Times':[2, 3, 5, 3, 2, 4, 5]})
df

String	Times
a	2
b	3
c	5
d	3
e	2
f	4
g	5

How can I create a data frame into this in python? (stopped at d but you know the pattern). Thanks!

String	Times
a	1
a	2
b	1
b	2
b	3
c	1
c	2
c	3
c	4
c	5

Related: [How can I replicate rows in Pandas?](/q/50788508/4518341) — wjandrea, Feb 08 '22 at 21:09

score 3 · Answer 1 · answered Feb 08 '22 at 21:15

A bit messy, but seems to do what you want:

new_df = pd.DataFrame(df.String.str.repeat(df.Times).apply(list).explode().reset_index(drop=True))

new_df["Times"] = new_df.groupby("String").cumcount() + 1

Output:

   String  Times
0       a      1
1       a      2
2       b      1
3       b      2
4       b      3
5       c      1
6       c      2
7       c      3
8       c      4
9       c      5
10      d      1
11      d      2
12      d      3
13      e      1
14      e      2
15      f      1
16      f      2
17      f      3
18      f      4
19      g      1
20      g      2
21      g      3
22      g      4
23      g      5

Honestly surprised mine performed the worst. Probably due to new DataFrame creation? — ddejohn, Feb 08 '22 at 22:07

score 3 · Accepted Answer · answered Feb 08 '22 at 21:17

Another method could be:

df.assign(Times = df.Times.apply(lambda x: range(1, x + 1))).explode('Times')
Out[]: 
  String Times
0      a     1
0      a     2
1      b     1
1      b     2
1      b     3
2      c     1
2      c     2
2      c     3
2      c     4
2      c     5

mozway · Answer 3 · 2022-02-08T21:36:28.880

Use Index.repeat to replicate the rows and groupby+cumcount: to enumerate them:

(df.loc[df.index.repeat(df['Times'])]
   .assign(Times=lambda d: d.groupby('String').cumcount().add(1))
 )

Output:

  String  Times
0      a      1
0      a      2
1      b      1
1      b      2
1      b      3
2      c      1
2      c      2
2      c      3
2      c      4
2      c      5
3      d      1
...

comparison of all answers (7k rows):

@mozway
11.2 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

@Onyambu
16 ms ± 685 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@ifly6
27.3 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@ddejohn
28.5 ms ± 4.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

On 700k rows: 1s, 1.43s, 2.34s, 2.94s in the same order

score 1 · Answer 4 · answered Feb 08 '22 at 21:08

Use df.explode. Make this work by converting your repetitions to list:

s = ['a','b', 'c', 'd', 'e', 'f', 'g']
t = [2, 3, 5, 3, 2, 4, 5]

Then,

times = [list(range(i + 1)) for i in t]
df = pd.DataFrame({'string': s, 'times': times}).explode('times')

>>> df.head(10)
  string times
0      a     0
0      a     1
0      a     2
1      b     0
1      b     1
1      b     2
1      b     3
2      c     0
2      c     1
2      c     2

If you don't want the index repeated like that, df.reset_index(drop=True, inplace=True) (if in place, otherwise, reassign). If you want 1-indexing rather than 0-indexing, for list(range(i + 1)) substitute list(range(1, i + 2)). Alternatively, just add one to the times column at the end. The times column doesn't really matter for df.explode, it explodes based on repetitions anyway.

score 0 · Answer 5 · answered Feb 22 '23 at 02:11

ss1=df.String.mul(df.Times).map(list).explode("")
ss2=ss1.groupby(ss1).cumcount().add(1)
pd.DataFrame(dict(String=ss1,Times=ss2))

Output:

   String  Times
0       a      1
1       a      2
2       b      1
3       b      2
4       b      3
5       c      1
6       c      2
7       c      3
8       c      4
9       c      5
10      d      1
11      d      2
12      d      3
13      e      1
14      e      2
15      f      1
16      f      2
17      f      3
18      f      4
19      g      1
20      g      2
21      g      3
22      g      4
23      g      5

replicate rows by n times in python

5 Answers5

comparison of all answers (7k rows):