I am trying to return a df
where duplicate values have been removed. I have tried to use drop.duplicates()
but the values in the columns which have been subset
aren't ordered. As in, the values are duplicates but they aren't in the same order.
For instance, using the df
below, if I try to remove duplicate values from Item_X
and Item_Y
it will return the same df
. Where the intended output will remove the second row.
import pandas as pd
d = ({
'Item_X' : ['Foo','Bar','Bot','Bot','Bar','Foo'],
'Item_Y' : ['Bar','Foo','Foo','Bot','Bar','Foo'],
'Value' : [1,2,3,4,5,6],
})
df = pd.DataFrame(data = d)
df.drop_duplicates(subset=['Item_X','Item_Y'])
Expected Result:
Item_X Item_Y Value
0 Foo Bar 1
2 Bot Foo 3
3 Bot Bot 4
4 Bar Bar 5
5 Foo Foo 6
Actual Output (Incorrect):
Item_X Item_Y Value
0 Foo Bar 1
1 Bar Foo 2
2 Bot Foo 3
3 Bot Bot 4
4 Bar Bar 5
5 Foo Foo 6
What would be the most efficient way to tackle this problem?