Duplicate Rows in Pandas Dataframe if Values are in a List

Question

I have a dataframe that looks like this:

fruit   cost
apples  2
oranges 3
bananas 5
grapefruit  1

I want to pass a list that includes specified "fruit" column values and use that to duplicate those rows in the dataframe. For example, duplicated fruits = ['apples', 'oranges'].

These rows should then be copied back into the dataframe with an extra column that denotes that they are a copy (can be a binary 1/0).

I only want to duplicate "oranges" and "apples," so values I have specified. The desired output is to duplicate these rows in the dataframe and create a new column marking which rows are originals and which are copies. — dataelephant, Nov 01 '19 at 13:12

jezrael · Accepted Answer · 2019-11-01T13:47:27.460

3

Use Series.isin for get matching rows and for duplication use DataFrame.append to original data with DataFrame.assign for indicato column:

duplicated  = ['apples', 'oranges']
df1 = df[df['fruit'].isin(duplicated)].assign(new=1)
df = df.assign(new=0).append(df1, ignore_index=True)
print (df)
        fruit  cost  new
0      apples     2    0
1     oranges     3    0
2     bananas     5    0
3  grapefruit     1    0
4      apples     2    1
5     oranges     3    1

Another idea is use parameter keys in concat - it create new level filled by 0 and 1, so necessary DataFrame.reset_index by first level for convert this level for column:

df = (pd.concat([df, df1], keys=(0,1))
       .rename_axis(('new', None))
       .reset_index(level=0)
       .reset_index(drop=True))
print (df)
   new       fruit  cost
0    0      apples     2
1    0     oranges     3
2    0     bananas     5
3    0  grapefruit     1
4    1      apples     2
5    1     oranges     3

edited Nov 01 '19 at 13:47

answered Nov 01 '19 at 13:11

jezrael

822,522
95
1,334
1,252

Thanks! Follow up q - if I want to specify that row now, i.e. where apples is duplicated (fruit = apples, new = 1) and I want to replace "apples" with "apple juice," how could I do that replacement? – dataelephant Nov 01 '19 at 14:50
@dataelephant use `m =(df['fruit'] =='apples') & (df['new'] ==1)` and then `df.loc[m, 'fruit'] = "apple juice"` – jezrael Nov 01 '19 at 16:23
@dataelephant also check [this](https://stackoverflow.com/a/36910033/2901002). – jezrael Nov 01 '19 at 16:25

score 0 · Answer 2 · answered Nov 01 '19 at 13:41

You could use concat:

result = pd.concat([df, df[df.fruit.isin(fruits)].assign(new=1)], sort=False).fillna(0)

Output

        fruit  cost  new
0      apples     2  0.0
1     oranges     3  0.0
2     bananas     5  0.0
3  grapefruit     1  0.0
0      apples     2  1.0
1     oranges     3  1.0

As an alternative you could reindex, with default_value=0, before concat:

filtered = df[df.fruit.isin(fruits)].assign(new=1)

result = pd.concat([df.reindex(columns=filtered.columns, fill_value=0), filtered], sort=False)

print(result)

Output

        fruit  cost  new
0      apples     2    0
1     oranges     3    0
2     bananas     5    0
3  grapefruit     1    0
0      apples     2    1
1     oranges     3    1

Duplicate Rows in Pandas Dataframe if Values are in a List

2 Answers2