0

I have a dataset and i want to create an additional column and want to flag the values that are outliers (more than 1.5 times the IQR). I am currently using this code:

    #Add additional column for flagging outliers that are beyond  1.5*interquartile range


    plotdata$OUTLIERFLAG <- 0
   #Cycle through variables
    for (i in 1: length(unique(plotdata$variable))){
    pms <- unique(plotdata$variable)[i]
    dats <- subset(plotdata, plotdata$variable ==pms)
    #Cycle through Sampling locations
    for (bore in unique(plotdata$Sample.Point)){
    subdats <- dats[dats$Sample.Point==bore,]
    x1 <- match(boxplot.stats(subdats$value2)$out, subdats$value2)
    ifelse(x1==0, NULL, plotdata[rownames(subdats[x1,]),]$OUTLIERFLAG <- 1)
    }
    }

However, some times the code is not working. for same values, i am getting one flagged as outlier, and the other not. please help

  • Related [calculating the outliers in R](https://stackoverflow.com/questions/12866189/calculating-the-outliers-in-r) – pogibas Dec 31 '18 at 08:00
  • 1
    You need to provide a reproducible example of a data set that shows your problem. – Sotos Dec 31 '18 at 08:17
  • 1
    Please provide a reproducible example with data. The provided code snippet isn't directly related to the question asked. In addition, are you sure you want to look at a value > IQR as an outlier, or a value > the 75% percentile? – Omri374 Dec 31 '18 at 09:18

1 Answers1

3

Since you're not providing any data, I will use the mtcars dataset. You probably want to define an outlier as a data point above Q3 + IQR * 1.5. Also, for loops are usually avoided for basic R operations.

df <- mtcars[, c(2, 4)]
df$outliers <- ifelse(test = df$hp > quantile(df$hp, probs = 0.75) + IQR(df$hp) * 1.5, yes = "FLAG", no = NA)
df

> df
                    cyl  hp outliers
Mazda RX4             6 110     <NA>
Mazda RX4 Wag         6 110     <NA>
Datsun 710            4  93     <NA>
Hornet 4 Drive        6 110     <NA>
Hornet Sportabout     8 175     <NA>
Valiant               6 105     <NA>
Duster 360            8 245     <NA>
Merc 240D             4  62     <NA>
Merc 230              4  95     <NA>
Merc 280              6 123     <NA>
Merc 280C             6 123     <NA>
Merc 450SE            8 180     <NA>
Merc 450SL            8 180     <NA>
Merc 450SLC           8 180     <NA>
Cadillac Fleetwood    8 205     <NA>
Lincoln Continental   8 215     <NA>
Chrysler Imperial     8 230     <NA>
Fiat 128              4  66     <NA>
Honda Civic           4  52     <NA>
Toyota Corolla        4  65     <NA>
Toyota Corona         4  97     <NA>
Dodge Challenger      8 150     <NA>
AMC Javelin           8 150     <NA>
Camaro Z28            8 245     <NA>
Pontiac Firebird      8 175     <NA>
Fiat X1-9             4  66     <NA>
Porsche 914-2         4  91     <NA>
Lotus Europa          4 113     <NA>
Ford Pantera L        8 264     <NA>
Ferrari Dino          6 175     <NA>
Maserati Bora         8 335     FLAG
Volvo 142E            4 109     <NA>

Maserati Bora with 8 cylinders and 335 horsepower is the only outlier. A box-and-whisker plot indicating the abnormal data point:

boxplot(df$hp, horizontal = TRUE)
# Vertical line indicating the outlier limit
abline(v = quantile(df$hp, probs = 0.75) + IQR(df$hp) * 1.5, col = "red")  # 305.25

enter image description here

Samuel
  • 2,895
  • 4
  • 30
  • 45