This is built using these sample data:
sampleData <-
data.frame(
col1 = factor(rep(LETTERS[1:4], c(12, 6, 16, 20)*5)
, levels = LETTERS[1:4])
, col2 = factor(rep(LETTERS[1:4], c(1, 17, 16, 20)*5)
, levels = LETTERS[1:4])
, col3 = factor(rep(LETTERS[1:4], c(0, 18, 16, 20)*5)
, levels = LETTERS[1:4])
, col4 = rnorm(54*5, 4, 2)
, col5 = factor(rep(1:5, 54))
)
The basic approach is to simply add the label yourself manually. For that, I used table
to count the occurrences of each X/color and generated a new data.frame to display those. Note that, while you say that each color within the X groupings always has the same sample size, it is better to program defensively. Instead of trusting that (and, e.g., using the counts for the first color), I use apply
to get all of the unique values. As long as there is only one, the effect is the same. However, if there are more than one, this will give you an indication.
In addition, I went ahead and switched the mapping to use aes_string
so that it will populate through your column labels. If you don't like that behavior, just override with ylab
etc.
Similarly, the function sem
was not found (I assume it is a custom function), so I used the mean_cl_normal
function instead, which has the added advantage of utilizing the fun.data
argument for cleaner code. (I also prefer confidence intervals to just showing SEM, but that is more style than substance).
getPlotList = function(param.list, data=db, y, color){
param.list %>% sapply(function(var){
myCounts <- table(data[[var]], data[[color]])
forLabels <-
data.frame(
x = row.names(myCounts)
, label = paste("n =", apply(myCounts, 1, function(x){paste(unique(x), collapse = ";")}))
, y = 0.5
)
ggplot(data=data, aes_string(x=var, y=y, color=color))+
stat_summary(fun.data = mean_cl_normal, position = position_dodge(0.3), na.rm = TRUE) +
stat_summary(fun.y = mean, geom = "point", position = position_dodge(0.3), na.rm = TRUE) +
ylim(0, NA) +
geom_text(aes(x = x, y = y, label = label, color = NA)
, forLabels
, show.legend = FALSE)
}, simplify = FALSE, USE.NAMES = TRUE)
}
Now, this code:
c("col1", "col2", "col3") %>% getPlotList(y="col4", color="col5", data = sampleData)
gives the following plots:

At the request of @Nettle, I modified the code to use a bit more of the tidyverse
, specifically using Standard Evaluation to loop through the column list instead of using the base table
approach from before. I believe that the code should function identically. The main advantage is removing the intermediate variables, though one could argue that those improve readability.
getPlotList <- function(param.list, data=db, y, color){
param.list %>% sapply(function(var){
ggplot(data=data, aes_string(x=var, y=y, color=color))+
stat_summary(fun.data = mean_cl_normal, position = position_dodge(0.3), na.rm = TRUE) +
stat_summary(fun.y = mean, geom = "point", position = position_dodge(0.3), na.rm = TRUE) +
ylim(0, NA) +
geom_text(aes_string(x = var, y = "y", label = "label", color = NA)
, data %>%
count(!!as.name(var), !!as.name(color)) %>%
group_by(!!as.name(var)) %>%
summarise(
label = paste("n =", paste(unique(n), collapse = ";"))
) %>%
mutate(y = 0.5)
, show.legend = FALSE)
}, simplify = FALSE, USE.NAMES = TRUE)
}