-1

I was treating the outliers in the variable of a dataset with 614 observations. But I can't manage to write code.

I was working on a dataset with 614 observations and due to some reason don't want to trim it in response to treat the variables. I want any value greater than the 91st percentile to be equal to the 91st percentile without reducing the number of observations.

In simple words, I want to replace all values above 10171.5 (91st percentile) with 10171.5.

smci
  • 32,567
  • 20
  • 113
  • 146
  • 1
    How is your data represented/stored? What have you tried so far? – Simon Notley Aug 31 '19 at 21:03
  • It is stored as a dataframe. 1. I tried using the df.loc to extract rows having a value greater than the 91st percentile and then change it using replace but it was showing an error. 2. I also tried conditional statements, but even they didn't work. – ABHIRAJ SURESH Sep 01 '19 at 10:02
  • 1
    In that case you have an answer below. It's best to post your code and the error you get otherwise it's very difficult for people to give you relevent answers. Luckily the person below guessed correctly. We still have no idea why your code didn't work though. – Simon Notley Sep 01 '19 at 18:36
  • You have to **post your failing code (and also the error traceback it causes)**, otherwise we can't know what your precise issue is. If Mario's answer answers your question then please consider accepting it. If not, tell us what the problem is (i.e post the failing code with traceback). As stands this question is not a reusable resource. – smci Sep 17 '19 at 06:22
  • Welcome to SO. Please read the guide on how to ask a question, especially the part about [questions having to have an MCVE (Minimal, Complete Verifiable Example)](https://stackoverflow.com/help/minimal-reproducible-example) – smci Sep 17 '19 at 06:38

1 Answers1

1

In case your data is the form of dataframe, for each column you could get quantile via:

q = df["col"].quantile(0.91)

and then filter with:

df[df["col"] < q]

I also recommend that to read this article gives a very good overview of outlier detecting techniques then you need to clip the outliers or filter them. you can also check answer1, answer2 and general answer

Mario
  • 1,631
  • 2
  • 21
  • 51