1

What is the fastest way to expand (copy n times) the rows of a dataframe based on a value of a column. So, if the value of the column in that row is 10, that row has to be copied 10 times.

Example:

import pandas as pd
df = pd.DataFrame({"A":[1,45], "B":[2,3]})

operation

The result should look like this:

A   B
1   2
1   2
45  3
45  3
45  3
pnkjmndhl
  • 565
  • 3
  • 7
  • 21

3 Answers3

4

You can make do with repeat and loc:

df.loc[df.index.repeat(df['B'])]

Output:

    A  B
0   1  2
0   1  2
1  45  3
1  45  3
1  45  3
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Try using this:

df.loc[np.repeat(df.index.values, df['B'])]

Repeats the row in specific index for the number of times specified in column B.

Also you can try and look here: Python Pandas replicate rows in dataframe Not exactly the same need but a lot of solutions to learn from for replicating rows.

  • this gives an error, "TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe' – pnkjmndhl Mar 02 '20 at 20:05
1

We could also use DataFrame.reindex with Index.repeat

df.reindex(df.index.repeat(df['B']))
    A  B
0   1  2
0   1  2
1  45  3
1  45  3
1  45  3

if you need:

df.reindex(df.index.repeat(df['B']).astype(int))
ansev
  • 30,322
  • 5
  • 17
  • 31