0

I have dataset where I want to eliminate all data above 95th percentile of a variable, let's say 'col1'. I have used np.quantile method to calculate 95th percentile data and filtered all value below than this data.

df_95 = df[df['col1'] <  np.quantile(df['col1'], 0.95)] 

The problem I am facing is that there are multiple datas with the 95th percentile value in the variable, and my code is removing data from the first instance of it. Whereas I want it to do it from exact 95th percentile only.

ifly6
  • 5,003
  • 2
  • 24
  • 47
  • Hi, welcome to SO - can you provide a sample input and the expected output? – Mortz Jul 01 '21 at 12:14
  • Please clarify as to what data isn't being removed from your data frame by that filter. – ifly6 Jul 01 '21 at 13:13
  • I have a dataset where 95th percentile value of a column is 10800. ex.. [1, 2, 3, 4, 5, ............,10500, _10800_, 10800, 10800, **10800**, 10800, .......] The value in bold is the 95th percentile value. When I am using the code to remove all value above 95th percentile, it removes all value from first instance of 10800(mentioned in italic). the result dataset is [1, 2, 3, 4, 5, ............,10500]. What is want my output to be [1, 2, 3, 4, 5, ............,10500, _10800_, 10800, 10800]. Due to which I am left with 94.95% of data, instead of 95%. – Akhilesh Panigrahi Jul 02 '21 at 05:35
  • I voted to close this till the OP can include sample input data and the expected output, @AkhileshPanigrahi read [How to make good reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – yudhiesh Jul 03 '21 at 03:42

0 Answers0