So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data set which lie above the number or below the number?
Asked
Active
Viewed 8.2k times
6
-
Is this univariate data? Also can you provide sample data? – akash87 May 20 '17 at 19:47
-
@akash87 sorry I'm not sure what univariate data means – Diante May 20 '17 at 21:22
-
2Univariate data is data that is a single vector, not a matrix or data frame. – akash87 May 20 '17 at 21:22
4 Answers
22
You can get this using boxplot
. If your variable is x,
OutVals = boxplot(x)$out
which(x %in% OutVals)
If you are annoyed by the plot, you could use
OutVals = boxplot(x, plot=FALSE)$out

G5W
- 36,531
- 10
- 47
- 80
3
If your dataset is x
you can get those numbers using
summary(x)[["1st Qu."]]
and
summary(x)[["3rd Qu."]]
Then you compare against those numbers to get the numbers you want.

Bob Jansen
- 1,215
- 2
- 12
- 31
3
You can refer to the function remove_outliers
in this answer here. It does exactly what you want.
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
0
If you are trying to identify the outliers in your dataset using the 1.5 * IQR standard, there is a simple function that will give you the row number for each case that is an outlier based on your grouping variable (both under Q1 and above Q3). It will also create a Boxplot of your data that will give insight into the distribution of your data.
library(car)
Boxplot(DV ~ IV, data = datafile)
Where:
DV = measured variable
IV = grouping variable
-
please, share your code and data set. Reviewers are interesting to response clear question especially with codes. – Hamed Baziyad Jun 24 '20 at 21:56