26

I want to delete rows when a few conditions are met:

An example dataframe is shown below:

        one       two     three      four
0 -0.225730 -1.376075  0.187749  0.763307
1  0.031392  0.752496 -1.504769 -1.247581
2 -0.442992 -0.323782 -0.710859 -0.502574
3 -0.948055 -0.224910 -1.337001  3.328741
4  1.879985 -0.968238  1.229118 -1.044477
5  0.440025 -0.809856 -0.336522  0.787792
6  1.499040  0.195022  0.387194  0.952725
7 -0.923592 -1.394025 -0.623201 -0.738013
8 -1.775043 -1.279997  0.194206 -1.176260
9 -0.602815  1.183396 -2.712422 -0.377118

I want to delete rows based on the conditions that:

Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.

Then I tried to implement as follows:

df = df[df.one > 0 or df.two > 0 or df.three > 0 and df.four < 1]

However, it results in a error message as follows:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Could someone help me on how to delete based on multiple conditions?

cottontail
  • 10,268
  • 18
  • 50
  • 51
fyr91
  • 1,253
  • 5
  • 17
  • 33

2 Answers2

52

For reasons that aren't 100% clear to me, pandas plays nice with the bitwise logical operators | and &, but not the boolean ones or and and.

Try this instead:

df = df[(df.one > 0) | (df.two > 0) | (df.three > 0) & (df.four < 1)]
Brionius
  • 13,858
  • 3
  • 38
  • 49
  • 6
    You want `df = df[((df.one > 0) | (df.two > 0) | (df.three > 0)) & (df.four < 1)]` as to why it's because it's ambiguous to compare arrays as there are potentially multiple matches see this: http://stackoverflow.com/questions/10062954/valueerror-the-truth-value-of-an-array-with-more-than-one-element-is-ambiguous – EdChum Mar 12 '15 at 19:00
  • 1
    Oh, whoops, didn't see the `and` at the end. Edited. – Brionius Mar 12 '15 at 19:11
  • 1
    @Brionius: it's basically because `or` and `and` can't have their behaviour customized by a class. They do what they do based on the result of bool(the_object), and that's it. – DSM Mar 12 '15 at 19:30
  • To delete, say, any row with a string that contains 1 of 20 possible subkeys, [look here](http://stackoverflow.com/a/31663495/3491991) – zelusp Nov 15 '16 at 22:57
0

drop could be used to drop rows

The most obvious way is to constructing a boolean mask given the condition, filter the index by it to get an array of indices to drop and drop these indices using drop(). If the condition is:

Row with value of col 'one', 'two', or 'three' greater than 0; and value of col 'four' less than 0 should be deleted.

then the following works.

msk = (df['one'].gt(0) | df['two'].gt(0) | df['three'].gt(0)) & df['four'].lt(0)
idx_to_drop = df.index[msk]
df1 = df.drop(idx_to_drop)

The first part of the condition, i.e. col 'one', 'two', or 'three' greater than 0 can be written a little concisely with .any(axis=1):

msk = df[['one', 'two', 'three']].gt(0).any(axis=1) & df['four'].lt(0)

Keep the complement of the rows to drop

Deleting/removing/dropping rows is the inverse of keeping rows. So another way to do this task is to negate (~) the boolean mask for dropping rows and filter the dataframe by it.

msk = df[['one', 'two', 'three']].gt(0).any(axis=1) & df['four'].lt(0)
df1 = df[~msk]

query() the rows to keep

pd.DataFrame.query() is a pretty readable API for filtering rows to keep. It also "understands" and/or etc. So the following works.

# negate the condition to drop
df1 = df.query("not ((one > 0 or two > 0 or three > 0) and four < 0)")

# the same condition transformed using de Morgan's laws
df1 = df.query("one <= 0 and two <= 0 and three <= 0 or four >= 0")

All of the above perform the following transformation:

result

cottontail
  • 10,268
  • 18
  • 50
  • 51