I am trying to label outliers with ggplot. Regarding my code, I have two questions:
Why does it not label outliers below 1.5*IQR?
Why does it not label outliers based on the group they are in but instead apparently refers to the overall mean of the data? I would like to label outliers for each box plot individually. I.e. the outliers for Country A in Wave 1 (of a survey), etc.
A sample of my code:
PERCENT <- rnorm(50, sd = 3)
WAVE <- sample(6, 50, replace = TRUE)
AGE_GROUP <- rep(c("21-30", "31-40", "41-50", "51-60", "61-70"), 10)
COUNTRY <- rep(c("Country A", "Country B"), 25)
N <- rnorm(50, mean = 200, sd = 2)
df <- data.frame(PERCENT, WAVE, AGE_GROUP, COUNTRY, N)
ggplot(df, aes(x = factor(WAVE), y = PERCENT, fill = factor(COUNTRY))) +
geom_boxplot(alpha = 0.3) +
geom_point(aes(color = AGE_GROUP, group = factor(COUNTRY)), position = position_dodge(width=0.75)) +
geom_text(aes(label = ifelse(PERCENT > 1.5*IQR(PERCENT)|PERCENT < -1.5*IQR(PERCENT), paste(AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')), hjust = -.3, size = 3)
A picture of what I have so far:
I appreciate your help!