0

I am trying to annotate the outliers in a multi-group box-plot generated from the dataframe below:

Chr. variable value 1 1 W01 21270 2 2 W01 15478 3 3 W01 12479 4 4 W01 9293 5 5 W01 9936 6 6 W01 13160 7 7 W01 10386 8 8 W01 8021 9 9 W01 9627 10 10 W01 9635 11 11 W01 12918 12 12 W01 11617 13 13 W01 4158 14 14 W01 6863 15 15 W01 7259 16 16 W01 10021 17 17 W01 12567 18 18 W01 3752 19 19 W01 15910 20 20 W01 5557 21 21 W01 2908 22 22 W01 5247 23 X W01 4052 24 Y W01 42 25 1 W02 24278 26 2 W02 17624 27 3 W02 14105 . . . . . .

I adopted the following solution from this thread:

is_outlier <- function(x) { return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) }

dat.m %>% group_by(Chr.) %>% mutate(outlier = ifelse(is_outlier(value), value, as.numeric(NA))) %>% ggplot(., aes(x = factor(Chr.), y = value)) + geom_boxplot() + geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)

enter image description here

This however does not work for some reason. I'd like to label outliers with the corresponding value in the variable column. Any suggestion is much appreciated!

RJF
  • 427
  • 5
  • 16
  • 1
    I don't think I understand your example code. Should it work with your example dataset or the data in the linked question? Your variable is called `value` (I think), not `drat`. And you're missing the code that makes the text variable as a string, which is what the labels are based on in the linked answer. Can you add more of your code so we can see it (and maybe an example dataset that has an outlier in it :-) )? – aosmith Jun 17 '19 at 16:41
  • Thanks for your helpful suggestion. I edited accordingly! – RJF Jun 17 '19 at 18:25
  • I don't understand. Your plot has what you asked: outliers labeled with their value. Can you be more specific to what is wrong in your opinion? – Axeman Jun 17 '19 at 18:42
  • 1
    Oh, great, I think I see now. That answer you're working with makes the labels in two steps: first they make a logical variable `is_outlier` and then they make the label from that. You could do this in one step in your `mutate()` call. Like `mutate(outlier = ifelse(is_outlier(value), as.character(variable), NA))`. Notice I use `variable` for the labels instead of `value`. – aosmith Jun 17 '19 at 18:53
  • @aosmith Thanks a lot for your help. It resolves the problem. Could you please also post it as the Answer so I could mark this thread as resolved. – RJF Jun 17 '19 at 19:27

1 Answers1

0

Right now you are making the labeling variable outlier out of value instead of out of variable.

In order to use variable as the label you'll want to change the code within mutate() to something like

mutate(outlier = ifelse(is_outlier(value), as.character(variable), NA) )

The as.character() part of the code has to do with working with factors. If variable is already a character instead of a factor you won't need it.

aosmith
  • 34,856
  • 9
  • 84
  • 118