I'm running into several problems producing a graph that labels outliers. This is a sample of my data:
dput(x)
structure(list(Sample = c("EM16b", "TK01a", "TK34a", "WB25",
"BK29", "EM09b", "TK02b", "TK29a", "TK20a", "PT57", "PT59", "EM18a",
"PT01", "EM05b", "BK16", "EM07b", "WB17", "BK01", "PT22", "WB07"
), Tribe = c("ElMolo", "Turka", "Turka", "Webuye", "Baka", "ElMolo",
"Turka", "Turka", "Turka", "PT_Luo", "PT_Luo", "ElMolo", "PT_Luo",
"ElMolo", "Baka", "ElMolo", "Webuye", "Baka", "PT_Luhya", "Webuye"
), Breath.d13C = c(-22.63, -18.91, -19.23, -17.61, -23.07, -21.26,
-16.86, -14.54, -22.37, NA, -23.22, -17.54, -19.76, -20.18, -24.52,
-19.17, -16.09, -25.62, -19.92, -15.04)), .Names = c("Sample",
"Tribe", "Breath.d13C"), row.names = c(32L, 65L, 131L, 166L,
270L, 18L, 68L, 121L, 103L, 231L, 233L, 35L, 175L, 10L, 257L,
14L, 156L, 242L, 196L, 145L), class = "data.frame")
So, first R imported all columns as factors (which never did before). To transform it back to character and numeric I used
x$Sample<-as.character(x$Sample)
x$Tribe<-as.character(x$Tribe)
x$Breath.d13C<-as.numeric(as.character(x$Breath.d13C))
The dput
above is after having done this.
Moving on, I usually produce these graphs using the following code, which I found here:
library(dplyr)
library(ggplot2)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
x %>%
group_by(Tribe) %>%
mutate(outlier = ifelse(is_outlier(Breath.d13C), Sample, as.numeric(NA))) %>%
ggplot(., aes(x=factor(Tribe), y = Breath.d13C)) +
geom_boxplot() +
geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)+
scale_y_continuous(name=expression(delta^13*C["Breath"]*" "("\u2030")),
limits=c(-25,-10),
breaks=seq(-25, -10,1),
)
1) My first problem was that the outlier code couldn't cope with NA
's (although I thought the as.numeric (NA)
was supposed to deal with this, it didn't work). I tried to input na.rm
or na.omit
at several points in mutate
without success. I ended up running x<-filter(x,!is.na (Breath.d13C))
a priori but would prefer a more elegant solution.
When I then run the code it gives me #Error: incompatible size (4), expecting 3 (the group size) or 1
2)Tried to search for similar errors and mutate_
seems to be the recommended solution. Tried it and the code did run but it no longer labelled the outliers.
Any idea why?
Sorry if this is turns out to be very simple but I'm still a beginner in the R world.