0

I have some erroneous data points in my dataset that I need to get rid of (see image, it's very obvious there). So I need to drop rows based on dual condition - when column A is greater or equal 0.5 AND column B equals to 0.

enter image description here

So I tried:

df = df.drop(df[df['A'] >= 0.5 & df['B'] == 0].index, inplace=True)

This results in an error:

cannot compare a dtyped [float64] array with a scalar of type [bool]

I then tried to create a mask and drop rows this way:

mask = (df['A'] >= 0.5) & (df['B'] == 0)
df = df.drop(df[mask], axis = 1)

This for some reason results in all my data getting deleted save for the index column.

How do I do this properly? Thanks in advance!

Rob
  • 14,746
  • 28
  • 47
  • 65
NotAName
  • 3,821
  • 2
  • 29
  • 44
  • 1
    `df = df[(df['A'] <= 0.5) & (df['B'] != 0)]` or in your case: `df = df[~mask]` – Erfan Nov 27 '19 at 22:51
  • Thanks! This worked! Does "~" here means to invert selection? – NotAName Nov 27 '19 at 23:07
  • 1
    Yes exactly! The link of the duplicate question has tons of valuable information. I suggest you take a ready and also look at the `.query` method. – Erfan Nov 27 '19 at 23:17

0 Answers0