0

I make for each variable in my dataframe a histogram, lineplot and boxplot to assess the distribution of each variable and plot these graphs in one window.

For variable VARIABLE my code looks like:

variable_name_string = "VARIABLE"

hist = qplot(VARIABLE, data = full_data_noNO, geom="histogram", 
fill=I("lightblue"))+
theme_light()

avg_price = full_data_noNO %>% 
group_by(Month, Country) %>%
dplyr::summarize(avg = mean(VARIABLE, na.rm = 
TRUE))

#line graph for different countries over time
line = ggplot(data=avg_price, aes(x=anydate(Month), y=VARIABLE, 
group=Country)) +
xlab("Date")+
ylab(variable_name_string)+
geom_line(aes(color=Country), size = 1)+
theme_light()

#boxplot over different years
avg_price2 = avg_price
avg_price2$Month = format(as.Date(anydate(avg_price$Month), "%Y-%m-%d"), 
"%Y")

box = ggplot(avg_price2, aes(x = Month, y=VARIABLE, fill = Month)) + 
geom_boxplot()+
xlab("Date")+
ylab(variable_name_string)+
guides(fill=FALSE)+
theme_light()

var_name = grid.text(variable_name_string, gp=gpar(fontsize=20))

#merge plot into one window
grid.arrange(var_name, hist, line, box, ncol=2)

This works fine for one variable, but now I want to do this for every variable in my dataframe and save the merged plot window for all variables. I have been looking for almost the entire day but I cannot find a solution. Can anyone help me?

Activation
  • 93
  • 6
  • The best way to do this is to take each of those plotting tasks and turn them into a function which takes a list of names and a data frame and modifies them as desired then plots them consecutively. Then you you simply pass the function the dataframe and the list of variables you want plotted. – sconfluentus Nov 04 '18 at 17:02
  • Or you could create a for loop and iterate through that way... – sconfluentus Nov 04 '18 at 17:02
  • Welcome to SO! Could you make your problem reproducible by sharing a sample of your data and the code you're working on so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Nov 04 '18 at 17:25
  • My main problem is that I don't know how to set the `VARIABLE` to a different value in a loop. If I just set it to a variable I get the error `argument is not numeric or logical: returning NA`. Does anyone know a solution for this? – Activation Nov 05 '18 at 08:20

1 Answers1

1

Without reproducible example it is hard to help, but you could try to wrap your plotting code in a function and use lapply to repeatedly call the function for all your variables.

make_plots <- function (variable_string) {
  var_quo <- rlang::sym(variable_string)
  hist = qplot(!!var_quo, data = full_data_noNO, geom="histogram", 
               fill=I("lightblue"))+
    theme_light()

  avg_price = full_data_noNO %>% 
    group_by(Month, Country) %>%
    dplyr::summarize(avg = mean(!!var_quo, na.rm = 
                                  TRUE))

  #line graph for different countries over time
  line = ggplot(data=avg_price, aes(x=anydate(Month), y=!!var_quo, 
                                    group=Country)) +
    xlab("Date")+
    ylab(variable_string)+
    geom_line(aes(color=Country), size = 1)+
    theme_light()

  #boxplot over different years
  avg_price2 = avg_price
  avg_price2$Month = format(as.Date(anydate(avg_price$Month), "%Y-%m-%d"), 
                            "%Y")

  box = ggplot(avg_price2, aes(x = Month, y=!!var_quo, fill = Month)) + 
    geom_boxplot()+
    xlab("Date")+
    ylab(variable_string)+
    guides(fill=FALSE)+
    theme_light()

  var_name = grid.text(!!var_quo, gp=gpar(fontsize=20))

  #merge plot into one window
  combined <- grid.arrange(var_name, hist, line, box, ncol=2)

  # Save combined plot at VARIABLE_plots.pdf
  ggsave(paste0(variable_string, "_plots.pdf"), combined)
  combined
}

# Make sure to pass the variable names as character vector
plots <- lapply(c("VARIABLE1", "VARIABLE2"), make_plots)
# OR
plots <- lapply(colnames(full_data_noNO), make_plots)

# Plots can also be accessed and printed individually
print(plots[["VARIABLE1"]])
Clemens Hug
  • 477
  • 4
  • 8
  • Thanks! I am going to try this! – Activation Nov 04 '18 at 19:29
  • My data are just numerical variables, but I cannot post it here since it is client data. However, I get the following warning using this approach `In mean.default(VARIABLE, na.rm = TRUE) : argument is not numeric or logical: returning NA` . So in this part `dplyr::summarize(avg = mean(VARIABLE, na.rm = TRUE))` it doesn't work to use the VARIABLE name as real variable. Do you maybe know how to solve this? – Activation Nov 05 '18 at 08:15
  • You can always post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that resembles your real data. I tried to fix the function, I suspect it was a quoting issue. Make sure you pass the variable names as character vector into lapply. – Clemens Hug Nov 05 '18 at 14:58
  • This is indeed the solution! Thanks! – Activation Nov 05 '18 at 16:12