-1

I have a df, where I'm trying to compare 2 columns, and if they have around the same value in the same row, I want it to be dropped from the df. i.e.:

   A       B  
1  3.21   3.15
2  6.98   2.07
3  5.41   8.95
4  0.32   0.30

I would want only rows 2/3 to remain in the df, because in rows 1/4 A and B are similar to each other.

I've tried to do something like if i in column A is within a range (+/- 15% of the value of row B) remove that row, but it didn't work. Didn't know if there was some sort of built in function that pandas had for that.

  • looks like you want to conditionally drop rows, is that correct? This previous post may help https://stackoverflow.com/questions/13851535/how-to-delete-rows-from-a-pandas-dataframe-based-on-a-conditional-expression I think it would look something like `df.drop(df[((df.A -df.B)/ df.B) < .15].index)` – bartius Nov 05 '21 at 04:32
  • 1
    @bartius did not know about this. Thanks! – piesandcustards Nov 05 '21 at 04:44

2 Answers2

4

You could do this by passing rtol parameter to numpy.isclose:

result = df[~np.isclose(df.A, df.B, atol=0, rtol=0.15)]
#       A     B
# 2  6.98  2.07
# 3  5.41  8.95
hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
2

You could define your lower and upper bounds on permissable values

lower = df["A"]*0.85
upper = df["A"]*1.15

and then filter using pandas.Series.between

df[~df["B"].between(lower, upper)]
Riley
  • 2,153
  • 1
  • 6
  • 16