Replicate X times specific rows of pandas dataframe

Question

I have a dataframe and want to replicate specific rows based on value in a column and looking for a simple way to do it. For example in the dummy data how can i replicate row where Age is 24 - 2 more times and where Age is 22 - 2 more times.

import pandas as pd

data= [['Karan',23],['Rohit',22],['Macron',22], ['Sahil',21],['Aryan',24]]

df = pd.DataFrame(data, columns=['Name','Age'])

So my final dataframe should look like [row order not important]

Name Age
Karan 23
Rohit 22
Rohit 22
Macron 22
Macron 22
Sahil 21
Aryan 24
Aryan 24

Tried [https://stackoverflow.com/questions/24029659/python-pandas-replicate-rows-in-dataframe] but no working for data with string.

does rows order matter? – Riccardo Bucco Jul 01 '22 at 12:26 — Riccardo Bucco, Jul 01 '22 at 12:26
not importnt, any order is ok – pranav nerurkar Jul 01 '22 at 12:31 — pranav nerurkar, Jul 01 '22 at 12:31

mozway · Accepted Answer · 2022-07-01T12:52:57.903

1

You can use Index.repeat:

df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))]

How it works:

determine whether Age is in [22,24]
add 1 (the False values become 1, the True become 2)
repeat the index and reindex

or, for more flexibility, with numpy.where, you can pick any value you want:

import numpy as np
df.loc[df.index.repeat(np.where(df['Age'].isin([22,24]), 2, 1))]

output:

     Name  Age
0   Karan   23
1   Rohit   22
1   Rohit   22
2  Macron   22
2  Macron   22
3   Sahil   21
4   Aryan   24
4   Aryan   24

resetting the index:

df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))].reset_index(drop=True)

output:

     Name  Age
0   Karan   23
1   Rohit   22
2   Rohit   22
3  Macron   22
4  Macron   22
5   Sahil   21
6   Aryan   24
7   Aryan   24

edited Jul 01 '22 at 12:52

answered Jul 01 '22 at 12:28

mozway

194,879
13
39
75

it is working but it is also duplicating the index. and when i reset the index with drop=True and inplace=True. the duplicate row is deleted. – pranav nerurkar Jul 01 '22 at 12:44
If you add `.reset_index(drop=True)`, this should give you a clean index keeping the new duplicates – mozway Jul 01 '22 at 12:51

score 1 · Answer 2 · answered Jul 01 '22 at 12:39

This is a possible solution (rows order is not preserved):

import numpy as np
import pandas as pd

mask = df['Age'].isin([22, 24])
df = pd.DataFrame(np.concatenate((
    np.repeat(df[mask].values, 2, axis=0),
    df[~mask].values
)), columns=df.columns)

Output:

     Name Age
0   Rohit  22
1   Rohit  22
2  Macron  22
3  Macron  22
4   Aryan  24
5   Aryan  24
6   Karan  23
7   Sahil  21

Replicate X times specific rows of pandas dataframe

2 Answers2

resetting the index: