2

I have two pandas data frame, each of which have a date column. I need to find the rows that do not have a common date

Assuming that I created an index on the date column, there are solutions to finding the rows with common index like this But I cannot find any elegant solution to finding the rows that do not have a common date. e.g. df1

                      values 1

28/11/2000          -0.055276
29/11/2000           0.027427
30/11/2000           0.066009
01/12/2000           0.012749
04/12/2000           0.113892

df2

                       values 2

24/11/2000            -0.004808
27/11/2000            -0.001812
28/11/2000            -0.026316
29/11/2000             0.015222
30/11/2000            -0.024480

give two dataframes: one which contains the rows in df1 that are not having common dates

df11

                     value 1

01/12/2000           0.012749
04/12/2000           0.113892

And the other which contains the rows in df2 that are not having the common dates df22

                       value 2
24/11/2000            -0.004808
27/11/2000            -0.001812

I'm okay with any other format of output also e.g having both the data frame output in one single data frame, if we can reduce the number of steps.

There are solutions to finding the rows that are not equal to each other like this, but then each element of the row has to be the same.

Can anyone help figure out an elegant solution. Thank you for any input

Ramana
  • 243
  • 4
  • 15

3 Answers3

2

Without reindexing, I would get the unique dates from each df and then mask the other using those dates. Something like

df1 = df1[~df1['date'].isin(df2['date'].unique())]
df2 = df2[~df2['date'].isin(df1['date'].unique())]
wpercy
  • 9,636
  • 4
  • 33
  • 45
  • All 3 solutions including this one and the one using index.difference and merge worked. Choosing this one as the answer because it is the least complicated at least for me and does not require setting an index. – Ramana Feb 18 '19 at 13:37
1

Use Index.difference:

df11 = df1.loc[df1.index.difference(df2.index)]
print (df11)
            values 1
01/12/2000  0.012749
04/12/2000  0.113892

df22 = df2.loc[df2.index.difference(df1.index)]
print (df22)
            values 2
24/11/2000 -0.004808
27/11/2000 -0.001812

Numpy solution with numpy.setdiff1d:

df11 = df1.loc[np.setdiff1d(df1.index, df2.index)]
df22 = df2.loc[np.setdiff1d(df2.index, df1.index)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

There is magic from merge using indicator

s=df1.merge(df2,left_index=True,right_index=True,indicator=True,how='outer')
df1_1=s.loc[s['_merge']=='left_only',['values1']]
df2_2=s.loc[s['_merge']=='right_only',['values2']]
df1_1
             values1
01/12/2000  0.012749
04/12/2000  0.113892
df2_2
             values2
24/11/2000 -0.004808
27/11/2000 -0.001812
BENY
  • 317,841
  • 20
  • 164
  • 234