I have two dataframes - df1
and df2
.
df1 has row1,row2,row3,row4,row5
df2 has row2,row5
I want to have a new dataframe such that df1-df2
. That is, the resultant dataframe should have rows as - row1,row3,row4
.
I have two dataframes - df1
and df2
.
df1 has row1,row2,row3,row4,row5
df2 has row2,row5
I want to have a new dataframe such that df1-df2
. That is, the resultant dataframe should have rows as - row1,row3,row4
.
You can use pandas.concat
to concatenate the two dataframes rowwise, followed by drop_duplicates
to remove all the duplicated rows in them.
In [1]: import pandas as pd
df_1 = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})
df_2 = pd.DataFrame({"A":["foo", "bar", "foo", "bar"], "B":[1,0,1,0], "C":["A","B","A","B"]})
In [2]: df = pd.concat([df_1, df_2])
In [3]: df
Out[3]:
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A
0 foo 1 A
1 bar 0 B
2 foo 1 A
3 bar 0 B
In [4]: df.drop_duplicates(keep=False)
Out[4]:
A B C
0 foo 0 A
2 foo 1 B
3 bar 1 A
You can use the index.difference()
function
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randn(5, 2), index= ['row' + str(i) for i in range(1, 6)])
df1
0 1
row1 0.249451 -0.107651
row2 1.295390 -1.773707
row3 -0.893647 -0.683306
row4 -1.090551 0.016833
row5 0.864612 0.369138
df2 = pd.DataFrame(np.random.randn(2, 2), index= ['row' + str(i) for i in [2, 5]])
df2
0 1
row2 0.549396 -0.675574
row5 1.348785 0.942216
df1.loc[df1.index.difference(df2.index), ]
0 1
row1 0.249451 -0.107651
row3 -0.893647 -0.683306
row4 -1.090551 0.016833
df = df1.drop_duplicates().merge(df2.drop_duplicates(), on=df2.columns.to_list(),
how='left', indicator=True)
df.loc[df._merge=='left_only',df.columns!='_merge']
Note that drop duplicated is used to minimize the comparisons. It would work without them as well.
The best way is to compare the row contents themselves and not the index or one/two columns and same code can be used for other filters like 'both' and 'right_only' as well to achieve similar results. index.difference
only works for unique index based comparisons
For these kind of questions, see left join in pandas.