Pandas: drop row if one column duplicate and another column distinct

Question

I want to drop rows in a pandas dataframe where value in one column A is duplicate and value in some other column B is not a duplicate given A. An illustrative example:

df = pd.DataFrame({'A': ['cat', 'dog', 'cat', 'cat', 'bat'],
                   'B': ['x', 'y', 'x', 'z', 'z'],
                   'C': [10, 20, 30, 40, 50]})

The desired output has Row 3 dropped as "cat" is duplicate and "z" is distinct in "cat" rows:

df.drop_duplicates('A') removes Rows 2 and 3, and df.drop_duplicates(subset=['A', 'B']) removes Row 2; neither is what I am looking for here [1].

score 2 · Accepted Answer · answered Dec 13 '19 at 19:48

2

Maybe this:

df[(~df.duplicated('A')) | df.duplicated(['A','B'])]

Output:

     A  B   C
0  cat  x  10
1  dog  y  20
2  bat  z  30
3  cat  x  40

answered Dec 13 '19 at 19:48

Quang Hoang

146,074
10
56
74

It seems to work. I try it on the original dataset too to see it it works there too. Thanks! (also, I had a typo in the question code that I just corrected) – Reveille Dec 13 '19 at 19:59

score 2 · Answer 2 · answered Dec 13 '19 at 20:01

2

IIUC transform

df[df.B==df.groupby('A').B.transform('first')]
Out[11]: 
     A  B   C
0  cat  x  10
1  dog  y  20
2  cat  x  30
4  bat  z  50

answered Dec 13 '19 at 20:01

BENY

317,841
20
164
234

Pandas: drop row if one column duplicate and another column distinct

2 Answers2