12

My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.

new_df = df.drop(df1)
XUTADO
  • 153
  • 1
  • 1
  • 6
  • 2
    What is this subset? is it a number of index values, specific values etc.? – EdChum Sep 09 '16 at 09:20
  • 2
    Or are you just wanting to diff the 2 dfs? like `merged = df.merge(df1, indicator=True, how='left')` `merged[merged['_merge'] == 'left_only']` – EdChum Sep 09 '16 at 09:25

2 Answers2

17

As you seem to be unable to post a representative example I will demonstrate one approach using merge with param indicator=True:

So generate some data:

In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[116]:
          a         b         c
0 -0.134933 -0.664799 -1.611790
1  1.457741  0.652709 -1.154430
2  0.534560 -0.781352  1.978084
3  0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092  1.179237

take a subset:

In [118]:
df_subset=df.iloc[2:3]
df_subset

Out[118]:
         a         b         c
2  0.53456 -0.781352  1.978084

now perform a left merge with param indicator=True this will add _merge column which indicates whether the row is left_only, both or right_only (the latter won't appear in this example) and we filter the merged df to show only left_only:

In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new

Out[121]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

here is the original merged df:

In [122]:
df.merge(df_subset, how='left', indicator=True)

Out[122]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
2  0.534560 -0.781352  1.978084       both
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • index_to_keep = df.index.symmetric_difference(subset.index);df.loc[index_to_keep, :] – PhilChang Sep 09 '16 at 09:42
  • @PhilChang that assumes that the indices along with their contents are the same between the larger df and the subset, as the OP hasn't posted any sample data, here `merge` will just work as it will use the column values – EdChum Sep 09 '16 at 09:44
15

The pandas cheat sheet suggests also the following technique

adf[~adf.x1.isin(bdf.x1)]

where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.

The particular question asked by the OP can also be solved by

new_df = df.drop(df1.index)
gciriani
  • 611
  • 2
  • 7
  • 19