2

I have a dataframe and want to replicate specific rows based on value in a column and looking for a simple way to do it. For example in the dummy data how can i replicate row where Age is 24 - 2 more times and where Age is 22 - 2 more times.

import pandas as pd

data= [['Karan',23],['Rohit',22],['Macron',22], ['Sahil',21],['Aryan',24]]

df = pd.DataFrame(data, columns=['Name','Age'])

So my final dataframe should look like [row order not important]

Name Age
Karan 23
Rohit 22
Rohit 22
Macron 22
Macron 22
Sahil 21
Aryan 24
Aryan 24

Tried [https://stackoverflow.com/questions/24029659/python-pandas-replicate-rows-in-dataframe] but no working for data with string.

mozway
  • 194,879
  • 13
  • 39
  • 75
pranav nerurkar
  • 596
  • 7
  • 19

2 Answers2

1

You can use Index.repeat:

df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))]

How it works:

  • determine whether Age is in [22,24]
  • add 1 (the False values become 1, the True become 2)
  • repeat the index and reindex

or, for more flexibility, with numpy.where, you can pick any value you want:

import numpy as np
df.loc[df.index.repeat(np.where(df['Age'].isin([22,24]), 2, 1))]

output:

     Name  Age
0   Karan   23
1   Rohit   22
1   Rohit   22
2  Macron   22
2  Macron   22
3   Sahil   21
4   Aryan   24
4   Aryan   24
resetting the index:
df.loc[df.index.repeat(df['Age'].isin([22,24]).add(1))].reset_index(drop=True)

output:

     Name  Age
0   Karan   23
1   Rohit   22
2   Rohit   22
3  Macron   22
4  Macron   22
5   Sahil   21
6   Aryan   24
7   Aryan   24
mozway
  • 194,879
  • 13
  • 39
  • 75
  • it is working but it is also duplicating the index. and when i reset the index with drop=True and inplace=True. the duplicate row is deleted. – pranav nerurkar Jul 01 '22 at 12:44
  • If you add `.reset_index(drop=True)`, this should give you a clean index keeping the new duplicates – mozway Jul 01 '22 at 12:51
1

This is a possible solution (rows order is not preserved):

import numpy as np
import pandas as pd

mask = df['Age'].isin([22, 24])
df = pd.DataFrame(np.concatenate((
    np.repeat(df[mask].values, 2, axis=0),
    df[~mask].values
)), columns=df.columns)

Output:

     Name Age
0   Rohit  22
1   Rohit  22
2  Macron  22
3  Macron  22
4   Aryan  24
5   Aryan  24
6   Karan  23
7   Sahil  21
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50