0

I have a large dataset with 30 different variables. I want to investigate some characteristics of each variable by making a histogram for each variable. For example, for my variable A this now looks like:

hist = qplot(A, data = full_data_noNO, geom="histogram", 
    binwidth = 50, fill=I("lightblue"))+
    theme_light()

Now, I want do this for all my variables. Does anyone know how I can loop through the names of all variables of my dataframe (so A should change each iteration).

Also, I want to loop through all variables in this code for the same purpose:

avg_price = full_data_noNO %>% 
    group_by(Month, Country) %>%
    dplyr::summarize(total = mean(A, na.rm = TRUE))
Activation
  • 93
  • 6
  • Take a look at [ggplot2 - create a barplot for every column of a dataframe](https://stackoverflow.com/questions/52822840/ggplot2-create-a-barplot-for-every-column-of-a-dataframe). You'll find two different approaches that hopefully help you solve your problem. – markus Nov 01 '18 at 13:11

2 Answers2

0

You could reference your variables by column number:

histograms = list()
for(i in 1:ncol(full_data_noNO)){
histograms[[i]] = qplot(full_data_noNO[,i], geom="histogram", 
    binwidth = 50, fill=I("lightblue"))+
    theme_light()
}
Fino
  • 1,774
  • 11
  • 21
  • Then I get the error `Error: Aesthetics must be either length 1 or the same as the data (961): colour, x, y, group` when making a boxplot and using aes(). Do you maybe know how to solve this? – Activation Nov 01 '18 at 13:25
  • Can you post a snippet of your data? I'm testing with `df = data.frame( x1 = rpois(100,3), x2 = rnorm(100,0,1), x3 = rexp(100,1/2)) )` and it's working fine. – Fino Nov 01 '18 at 13:35
  • Closing curly brace for the `for` loop is missing. Also, you need to output the plots in list as loop will only store them. – Parfait Nov 01 '18 at 13:37
  • It now worked when I transform my data vector to a data frame. However in the following `dplyr::summarize(avg_CATALOGUE_PRICE = mean(full_data_noNO[,6], na.rm = TRUE))` it returns incorrect values which are constant for all rows which is incorrect. Any idea why? – Activation Nov 01 '18 at 13:56
  • Edited to include the braces – Fino Nov 01 '18 at 14:47
0

If all your variables are numeric, then you can do the following to produce a list of all plots, which you can then explore one by one with list indexing:

library(tidyverse)
list_of_plots <-
  full_data_noNO %>%
  map(~ qplot(x = ., geom = "histogram"))
meriops
  • 997
  • 7
  • 6