0

I need to plot the outliers from a boxplot on to a map. My lecturer gave me the function to extract all outliers from this boxplot:

outliers = match(names(boxplot(pc3, plot = FALSE)$out), names(pc3))

(pc3 being the data)

I am then plotting them using:

points(Data.1$X[outliers], Data.1$Y[outliers], col = "red", cex = 3, lwd = 2)

However I want to extract the positive outliers into one variable and the negative outliers into a different variable in order to plot them in different colours. How do I do this?

Thank you.

TeddyTedTed
  • 103
  • 3
  • 2
    You already have the outliers and the data, right? Then you can determine from the data whether the outliers are above the mean (or median) or below it, and separate them out like that. Check out ?`[` for help with indexing. – Jota May 02 '14 at 13:48
  • How would I go about doing that? Apologies I'm quite new to R. – TeddyTedTed May 02 '14 at 14:25
  • Depends. It would be easier to give an exact solution if you provided example data to work with (see [How to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)). A simple way to do what I said with a vector would be: `above_outliers <- outliers[outliers > mean(pc3)]` for the case of outliers above the mean. – Jota May 02 '14 at 14:41

1 Answers1

1

Outliers are defined by boxplot as points farther than 1.5 times the inter-quartile range from the sides of the box (75th and 25th percentile). You can apply that definition directly:

iq.range <- quantile(pc3, probs=c(0.25, 0.75))
lower.bound <- iq.range[1] - 1.5*diff(iq.range)
upper.bound <- iq.range[2] + 1.5*diff(iq.range)

low.out <- pc3[pc3 < lower.bound]
high.out <- pc3[pc3 > upper.bound]

That's computing it from scratch. You can also split the vector that you get from boxplot using the median. Anything above is the higher part.

ilir
  • 3,236
  • 15
  • 23
  • I assume that > high.out <- pc3[pc3 < high.bound] should read: > high.out <- pc3[pc3 < upper.bound] Is that correct? – TeddyTedTed May 02 '14 at 14:12
  • @user3596332 there was a mistake in the last line. Fixed it now. Thanks for pointing it out. – ilir May 02 '14 at 14:15
  • Also when I originally plotted all outliers these are the ones that popped up: http://i.imgur.com/5WF8iZL.jpg, when I plotted your high.out this one came up: http://i.imgur.com/VzucCIU.jpg and with your low.out these all came up: http://i.imgur.com/KX64zN9.jpg. Thank you for the help I don't mean to sound unappreciative I'm just trying to figure it out. I'm new to R and generally just copy and paste my lecturer's material. – TeddyTedTed May 02 '14 at 14:20
  • @user3596332 are you sure you are applying the edited code I have posted? There were a couple of typos on the original answer. You seem to be computing the higher bound using the wrong condition. The code I have posted works fine with sample data. – ilir May 02 '14 at 14:37