How can I drop rows based if other another row respect some condition?

Question

Consider the dataframe df

   A  B  C   D  match?
0  x  y  1   1  true
1  x  y  1   2  false
2  x  y  2   1  false
3  x  y  2   2  true
4  x  y  3   4  false
5  x  y  5   6  false

I would like to drop the unmatched rows that are already matched somewhere else.

   A  B  C  D  match?
1  x  y  1  1  true
3  x  y  2  2  true
4  x  y  3  4  false
5  x  y  5  6  false

How can I do that with Pandas?

Nickil Maveli · Accepted Answer · 2017-01-21T06:54:01.187

3

You could sort those two columns so that their order of positioning could be made same throughout. Then, drop off all such duplicated entries present by providing keep=False in DF.drop_duplicates() method.

df[['C','D']] = np.sort(df[['C','D']].values)
df.drop_duplicates(keep=False)

edited Jan 21 '17 at 06:54

answered Jan 20 '17 at 14:48

Nickil Maveli

29,155
8
82
85

This seems do to the trick, even though, you have to be careful because "C" and "D" value can be swap (if the D is greater than C, not the case here) – fast_cen Jan 20 '17 at 15:06
Yeah, that's why I had to sort them before so that they're uniform throughout. – Nickil Maveli Jan 20 '17 at 15:10

piRSquared · Answer 2 · 2017-01-20T14:55:35.303

2

you can compare the two columns with

df.C == df.D

0     True
1    False
2    False
3     True
4    False
dtype: bool

Then shift the series down.

0      NaN
1     True
2    False
3    False
4     True
dtype: object

Each True value indicates the start of a new group. We can use cumsum to create the groupings we need for groupby

(df.C == df.D).shift().fillna(False).cumsum()

0    0
1    1
2    1
3    1
4    2
dtype: int64

Then use groupy + last

df.groupby(df.C.eq(df.D).shift().fillna(False).cumsum()).last()

   A  B  C  D
0  x  y  1  1
1  x  y  2  2
2  x  y  3  4

edited Jan 20 '17 at 14:55

answered Jan 20 '17 at 14:05

piRSquared

285,575
57
475
624

Your solution makes assumption on the DataFrame values. – fast_cen Jan 20 '17 at 14:58
@fast_cen what assumption would that be? – piRSquared Jan 20 '17 at 14:59
I'm updating the question with a more complete dataframe. Thanks for the help though ! – fast_cen Jan 20 '17 at 15:01
If two "unmatched" rows follow each others, you consider them as one group. – fast_cen Jan 20 '17 at 15:03
@fast_cen, you mean at the end? Yes... that's true. I'll update my answer. – piRSquared Jan 20 '17 at 15:04
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/133653/discussion-between-fast-cen-and-pirsquared). – fast_cen Jan 20 '17 at 15:47

score 0 · Answer 3 · answered Jan 20 '17 at 14:45

0

If you would like to remove the rows where "C" and "D" matched, the method .ix will help you:

df = df.ix[(df['C'] != df['D'])]

Therefore, df['C'] != df['D'] generates a list of booleans and .ix allows you to extract the corresponding DataFrame :)

answered Jan 20 '17 at 14:45

Sacha Vakili

66
6

How can I drop rows based if other another row respect some condition?

3 Answers3