2

I have a dataframe with around 80 columns and a few hundreds of rows, below is an example dataframe. I need to filter the dataframe based on the IQR value then delete the outliers but not the whole row, only the actual value/cell. As far I could define the Q1, Q3 and IQR.

I have found some solutions but they remove the whole row but I want to delete just the outlier value/cell.

list1 = [-1200, 0.75, 85, 92, 115, 116, 227, 252, 271, 308, 314, 443, 509, 717, 793, 805, 891, 903, 1958, 2978]
list2 = [-903, 0.68, 90, 109, 129, 164, 351, 365, 440, 472, 538, 547, 591, 679, 682, 706, 776, 785, 787, 2886]
list3 = [-1123, 0.033, 73, 182, 296, 355, 438, 509, 528, 530, 567, 608, 613, 632, 653, 727, 745, 761, 780, 1985]
list4 = [-1328, 0.363, 89, 106, 131, 165, 171, 202, 335, 339, 379, 399, 425, 488, 513, 657, 728, 761, 840, 1844]

df = pd.DataFrame ({'A' : list1, 'B' : list2, 'C' : list3, 'D' : list4})
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1

So the output dataframe should look like this:

output dataframe

tdy
  • 36,675
  • 19
  • 86
  • 83
  • Did the proposed solution work for you? – quest Mar 23 '21 at 06:41
  • Sorry for the late reply. Thank you for the answer. I got an error message when tried to implement your proposal: AttributeError: module 'pandas' has no attribute 'NA'. So it's not working now, but I got other ideas based on your proposal. So thanks again. – Norbert Mihut Mar 29 '21 at 11:05
  • The proposal is working right now with a bit of modification. Thank you very much for the idea. df[(df < Q1 - 1.5*IQR) | (df > Q3 + 1.5*IQR)] = np.nan – Norbert Mihut Mar 29 '21 at 11:13
  • Do you have an old version of pandas. https://github.com/selik/xport/issues/37 – quest Mar 29 '21 at 11:48
  • It seems. pd.__version__ Out[3]: '0.25.3' Thank you for bringing it to my attention. – Norbert Mihut Mar 30 '21 at 12:22
  • Great, if it solved your problem, pls accept it as answer. Thank you :) – quest Mar 30 '21 at 13:46

1 Answers1

0

You can try something like:

df[(df < Q1 - 1.5*IQR) | (df>Q3 + 1.5*IQR)] = pd.NA      

To set all the outliers to NA.

quest
  • 3,576
  • 2
  • 16
  • 26