I am surprised that you get the described problem with the boxplot colors as, when I run your code, the boxplots I obtain are generated with different fill colors, for example:

However, one problem with your code is that you don't see what is being plotted in the graph (you see x
and X
in the axis labels for all plots), and this is a problem with lapply()
because it doesn't give you access to the names of the analyzed variables.
I then suggest the following improvement to your code, so that each plot shows the name of the analyzed variables on the axis labels. The solution was inspired by the first comment by Akrun on this post, precisely about the problem with lapply()
, where the user suggests using names(obj)
instead of obj
as the argument of lapply()
.
library(ggplot2)
library(dplyr)
data("Salaries", package = "carData")
f <- function(df, xname) {
x = df[[xname]]
toplot = df %>% select_if(is.numeric)
lapply(
names(toplot), FUN = function(yname) {
y = toplot[[yname]]
print(ggplot(mapping=aes(x, y, fill = x)) +
geom_boxplot(col = "black") + xlab(xname) + ylab(yname))
}
)
}
Salaries_factors = Salaries %>% select_if(is.factor)
invisible(lapply(names(Salaries_factors), FUN = function(factor_name) f(Salaries, factor_name)))
As a summary, the main change w.r.t. your code was to replace lapply(Salaries...)
with lapply(names(Salaries), ...)
on the last line.
When we run this code, we get the boxplot shown at the end (containing the distribution of the salary
variable in terms of the sex
factor), where both the horizontal and the vertical labels are informative of the variables being plotted.
Note the following additional side changes I did to your original code:
- I made the function applicable to other datasets by adding the data frame containing the data as first parameter.
- I added the
invisible()
call to lapply()
in order to eliminate the (possibly unwanted) messages generated by lapply()
of the groups being analyzed at each iteration(*). At the same time, this required enclosing the ggplot()
call with print()
... otherwise, no plots are generated.
(*) As a caveat, should the automatic printing of lapply()
be of interest, this solution would NOT show informative values of the groups if the invisible()
call is removed. The information one sees in that case is simply [[1]]
, [[2]]
, etc., instead of $rank
, $sex
, etc.
