0
library(ggplot2)

A <- c(rep(LETTERS[1:5],2))
B <- rep(c("one", "two"),5)
set.seed(200)
C <- round(rnorm(10),2)
dff <- data.frame(A,B,C)
dff

ggplot(dff, aes(x=B, y=C, fill=B)) + 
    geom_boxplot()

Is it possible to use A to label the outliers?

Nip
  • 387
  • 4
  • 11
  • 2
    Hi, this might be a help https://stackoverflow.com/questions/33524669/labeling-outliers-of-boxplots-in-r – Christian Johansson Mar 21 '20 at 16:10
  • I like the second answer of the question suggest in the previous comentary, but it uses rownames to label the outliers. I think I cannot use the A as rownames since it repeates labes. – Nip Mar 21 '20 at 16:30

2 Answers2

1

Here's a solution to label only the outliers in your data:

library(tidyverse)
outlier <- dff %>%
  group_by(B) %>%
  summarise(outlier = list(boxplot.stats(C)$out))


ggplot(dff, aes(x=B, y=C, fill=B)) + 
  geom_boxplot() +
  geom_text(aes(label = if_else(C %in% unlist(outlier$outlier), as.character(A), "")), position=position_nudge(x=-.1))                                              

which produces this plot:

enter image description here

Caitlin
  • 505
  • 2
  • 14
1

I edited the second answer in the question suggested in the first comment to suit my case.

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
dat <- dff %>% tibble::rownames_to_column(var="outlier") %>% group_by(factor(B)) %>% 
mutate(is_outlier=ifelse(is_outlier(C), C, as.numeric(NA)))
dat$outlier[which(is.na(dat$is_outlier))] <- as.numeric(NA)
ggplot(dat, aes(y=C, x=factor(B),fill=factor(B))) + 
geom_boxplot() + 
geom_text(aes(label=dat$A[dat$is_outlier != "NA"]),na.rm=TRUE,nudge_y=0.05)

Might not be the best answer :D

Nip
  • 387
  • 4
  • 11