4

I would like to process data frame through dplyr and ggplot using column names in form of string. Here is my code

library(ggplot2)
library(dplyr)
my_df <- data.frame(var_1 = sample(c('a', 'b', 'c'), 1000, replace = TRUE),
                    var_2 = sample(c('d', 'e', 'f'), 1000, replace = TRUE))

name_list = c('var_1', 'var_2')

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(el) %>% summarize(count = n())
    ggplot(data = test, aes(x = el, y = count)) + geom_bar(stat='identity')
  dev.off()
}

The above code obviously does not work. So I tried different things like UQ and as.name. UQ creates column with extra quotes and ggplot does not understand it with aes_string. Any suggestions?

I can use for (el in names(my_df)) with filtering, but would prefer to work with strings.

UPDATE Here are detailed messages/errors that I got:

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(!!el) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

The above code generate empty files.

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(UQ(el)) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

The above code also generates empty files

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(as.name(el)) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

produces

Error in mutate_impl(.data, dots) : 
  Column `as.name(el)` is of unsupported type symbol
user1700890
  • 7,144
  • 18
  • 87
  • 183
  • Related: (dplyr) https://stackoverflow.com/questions/45442513/dplyr-0-7-specify-grouping-variable-as-string (ggplot) https://stackoverflow.com/questions/15458526/r-pass-variable-column-indices-to-ggplot2 – MrFlick Sep 22 '17 at 20:26
  • 1
    Show your attempts with `UQ` and `aes_string`. Give the error messages you got. Otherwise this each part of this is just a duplicate of a simpler question. – MrFlick Sep 22 '17 at 20:27

2 Answers2

3

You need to UQ (or !!) the name/symbol. For example

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
  test <- my_df %>% group_by(UQ(as.name(el))) %>% summarize(count = n())
  print(ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity'))
  dev.off()
}
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Oh. You need to print() the ggplot object inside a loop with this method. Didn't think about that problem because I missed that part. – MrFlick Sep 22 '17 at 21:05
2

I made two changes to your code:

  1. To "group by" variable in dplyr use group_by_ instead of group_by;
  2. To call variable in ggplot2 use aes_string or get(variable);

I also added minor changes (e.g. ggsave to save plots).

library(ggplot2)
library(dplyr)
my_df <- data.frame(var_1 = sample(c('a', 'b', 'c'), 1000, replace = TRUE),
                    var_2 = sample(c('d', 'e', 'f'), 1000, replace = TRUE))

name_list = c('var_1', 'var_2')

for(el in name_list){
    p <- my_df %>% 
         group_by_(el) %>% 
         summarize(count = n()) %>%
         ggplot(aes(x = get(el), y = count)) +
             geom_bar(stat = "identity")
    ggsave(paste0(el, ".pdf"), p)
}
pogibas
  • 27,303
  • 19
  • 84
  • 117
  • `ggsave` made all the difference, but without `ggsave` it does not work – user1700890 Sep 22 '17 at 21:03
  • The reason `ggsave` is needed for this to work is that `get()` is evaluated lazily. You need `ggsave` inside the loop to make sure `get()` pulls the correct string. I found that if I use `aes_string` instead of `get()` then things are fine. I.e.: `ggplot(aes_string(x = el, y = "count")) ...` – winni2k Nov 03 '21 at 09:50