0

I have the following DataFrame df:

d = {'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
     '4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3],  '6': ['100', 'AAA', 3]}

columns=['Price', 'Name', 'Class']

df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns

I want to duplicate rows based on values of the column Class. In particular, I want to randomly select rows where Class is equal to 3, and duplicate them. For example, in current df I have 3 rows with Class equal to 3. How can I create N duplicates, where N is configurable, for example:

N = 2
target_column = "Class"
target_value = 3
new_df = create_duplicates(df, target_column, target_value, N)

I was thinking to use for-loop and at each iteration (when Class is equal to 3) generate a random number. If it's greater than 0.5, then the row is added to a list of selected rows. This process continues until a list of selected rows contains N rows. Then these N rows are appended to df.

Is there a more elegant and shorter way to do the same? Maybe some built-in pandas functions?

Fluxy
  • 2,838
  • 6
  • 34
  • 63
  • I'm not following your question. Do you want to duplicate the entire dataset by a number `N`? or just some specific classes? – Quang Hoang Dec 03 '19 at 20:33
  • @QuangHoang: Sorry if it's unclear. I want to duplicate specific rows by a number `N`. The "specific" rows are those ones that have `Class` equal to `3`. – Fluxy Dec 03 '19 at 20:44

1 Answers1

1

I think this script below will do what you need. I lifted the repetition part from: Repeat Rows in Data Frame n Times

n=3
pd.concat([df,df[df['Class']==3].loc[df.index.repeat(n)].dropna()]).sort_values('Name')
Lukas Thaler
  • 2,672
  • 5
  • 15
  • 31
  • Thanks! How does it select rows? Randomly? Or does it take a first row and duplicate it N times? – Fluxy Dec 03 '19 at 20:50
  • In this case it selects the rows with a class = 3. Can you better define what you mean by randomly? You could select one row by index, randomly; you could select N rows by index, but you might end up randomly selecting the same row more than one, so you could put some controls around that; you could randomly select a "class" and duplicate all the rows with the select class. – james hendricks Dec 03 '19 at 20:57