Using apply functions with ggplot to plot a subset of dataframe columns

Question

I have a dataframe df with many columns ... I'd like plot of subset of columns where c is a list of the columns I'd like to plot.

I'm currently doing the following

df <-structure(list(Image.Name = structure(1:5, .Label = c("D1C1", "D2C2", "D4C1", "D5C3", "D6C2"), class = "factor"), Experiment = structure(1:5, .Label = c("020718 perfusion EPC_BC_HCT115_Day 5", "020718 perfusion EPC_BC_HCT115_Day 6", "020718 perfusion EPC_BC_HCT115_Day 7", "020718 perfusion EPC_BC_HCT115_Day 8", "020718 perfusion EPC_BC_HCT115_Day 9"), class = "factor"), Type = structure(c(2L, 1L, 1L, 2L, 1L), .Label = c("VMO", "VMT"), class = "factor"), Date = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "18-Apr-18", class = "factor"), Time = structure(1:5, .Label = c("12:42:02 PM", "12:42:29 PM", "12:42:53 PM", "12:43:44 PM", "12:44:23 PM"), class = "factor"),     Low.Threshold = c(10L, 10L, 10L, 10L, 10L), High.Threshold = c(255L,     255L, 255L, 255L, 255L), Vessel.Thickness = c(7L, 7L, 7L,     7L, 7L), Small.Particles = c(0L, 0L, 0L, 0L, 0L), Fill.Holes = c(0L,     0L, 0L, 0L, 0L), Scaling.factor = c(0.001333333, 0.001333333,     0.001333333, 0.001333333, 0.001333333), X = c(NA, NA, NA,     NA, NA), Explant.area = c(1.465629333, 1.093447111, 1.014612444,     1.166950222, 1.262710222), Vessels.area = c(0.255562667,     0.185208889, 0.195792, 0.153907556, 0.227996444), Vessels.percentage.area = c(17.43706003,     16.93807474, 19.29722044, 13.18887067, 18.05611774), Total.Number.of.Junctions = c(56L,     32L, 39L, 18L, 46L), Junctions.density = c(38.20884225, 29.26524719,     38.43832215, 15.42482246, 36.42957758), Total.Vessels.Length = c(12.19494843,     9.545333135, 10.2007416, 7.686755647, 11.94211976), Average.Vessels.Length = c(0.182014156,     0.153956986, 0.188902622, 0.08938088, 0.183724919), Total.Number.of.End.Points = c(187L,     153L, 145L, 188L, 167L), Average.Lacunarity = c(0.722820111,     0.919723402, 0.86403871, 1.115896082, 0.821753818)), .Names = c("Image.Name", "Experiment", "Type", "Date", "Time", "Low.Threshold", "High.Threshold", "Vessel.Thickness", "Small.Particles", "Fill.Holes", "Scaling.factor", "X", "Explant.area", "Vessels.area", "Vessels.percentage.area", "Total.Number.of.Junctions", "Junctions.density", "Total.Vessels.Length", "Average.Vessels.Length", "Total.Number.of.End.Points", "Average.Lacunarity"), row.names = c(NA, -5L), class = "data.frame")


doBarPlot <- function(x) {
  p <- ggplot(x, aes_string(x="Type", y=colnames(x), fill="Type") ) +
    stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
    stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width=0.5, na.rm = TRUE) +
    ggtitle("VMO vs. VMT") +
    theme(plot.title = element_text(hjust = 0.5) )
  print(p)
  ggsave(sprintf("plots/%s_bars.pdf", colnames(x) ) )
  return(p)
}

c = c('Total.Vessels.Length', 'Total.Number.of.Junctions', 'Total.Number.of.End.Points', 'Average.Lacunarity')
p[c] <- lapply(df[c], doBarPlot)

However this yields the following error :

Error: ggplot2 doesn't know how to deal with data of class numeric

Debugging shows that x inside of doBarPlot is of the type numeric rather than data.frame, so ggplot errors. However, test <- df2[c] yields a variable of the type data.frame.

Why is x a numeric? What's the best way to apply doBarPlot without resorting to a loop?

`lapply` is pulling each column off as a vector (see `lapply(iris, class)`) whereas ggplot is expecting a data.frame. Easy solution is to `gather` the data and then use `facet_wrap` — Richard Telford, May 08 '18 at 07:40
You could also consider passing just the name of the column that you want to plot to your function, rather than a data frame. Also, when asking a question, providing a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) will make it easier for others to help you. — Mikko Marttila, May 08 '18 at 08:42
@MikkoMarttila I've added an example dataframe. Hope this helps — agf1997, May 08 '18 at 08:58

Mikko Marttila · Accepted Answer · 2018-05-08T10:01:35.607

As others have noted, the problem with your initial approach is that when you use lapply on a data frame, the elements that you are iterating over will be the column vectors, rather than 1-column data frames. However, even if you did iterate over 1-column data frames, your function would fail: the data frame supplied to the ggplot call wouldn't contain the Type column that you use in the plot.

Instead, you could modify the function to take two arguments: the full data frame, and the name of the column that you want to use on the y-axis.

doBarPlot <- function(data, y) {
  p <- ggplot(data, aes_string(x = "Type", y = y, fill = "Type")) +
    stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
    stat_summary(
      fun.data = "mean_cl_normal",
      geom = "errorbar",
      width = 0.5,
      na.rm = TRUE
    ) +
    ggtitle("VMO vs. VMT") +
    theme(plot.title = element_text(hjust = 0.5))
  print(p)
  ggsave(sprintf("plots/%s_bars.pdf", y))
  return(p)
}

Then, you can use lapply to iterate over the character vector of columns you want to plot, while supplyig the data frame via the ... as a fixed argument to your plotting function:

library(ggplot2)

cols <- c('Total.Vessels.Length', 'Total.Number.of.Junctions',
          'Total.Number.of.End.Points', 'Average.Lacunarity')
p <- lapply(cols, doBarPlot, data = df)

Further, if you don't mind having all of the plots in one file, you could also use tidyr::gather to reshape your data into long form, and use facet_wrap in your plot (as suggested by @RichardTelford in his comment), avoiding the iteration and the need for a function altogether:

library(tidyverse)

df %>% 
  gather(variable, value, cols) %>% 
  ggplot(aes(x = Type, y = value, fill = Type)) +
    facet_wrap(~ variable, scales = "free_y") +
    stat_summary(fun.y = "mean", geom = "bar", na.rm = TRUE) +
    stat_summary(
      fun.data = "mean_cl_normal",
      geom = "errorbar",
      width = 0.5,
      na.rm = TRUE
    ) +
    ggtitle("VMO vs. VMT") +
    theme(plot.title = element_text(hjust = 0.5))

Thanks so much ... I had tried something similar to `p <- lapply(cols, doBarPlot, data = df)` but apparently was incorrectly passing the second input variable so it was erroring out. Very helpful — agf1997, May 08 '18 at 16:43

Amar · Answer 2 · 2018-05-08T08:15:58.590

0

The apply family of functions vectorise the objected passed. A simple example to illustrate this:

lapply(mtcars, function(x) print(x))

With your code, you are passing a vector of each column in your df to the function doBarPlot. The ggplot2 package works with dataframes, not lists or vectors and therefore you get the error.

If you want to use your function, apply it directly to the subsetted df:

doBarPlot(df[ , c])

If you have a bunch of dataframes and you want to subset by the columns in c checkout this answer: How to apply same function to every specified column in a data.table

or alternatively, look into the dplyr::select()

edited May 08 '18 at 08:15

answered May 08 '18 at 07:46

Amar

1,340
1
8
20

`doBarPlot(df[c])` doesn't seem to work as it doesn't create a plot for each column in `c` ... just the first one. – agf1997 May 08 '18 at 08:04
I misunderstood your question. I will update my answer soon. – Amar May 08 '18 at 08:16

Using apply functions with ggplot to plot a subset of dataframe columns

2 Answers2

Linked