0

I have read http://dplyr.tidyverse.org/articles/programming.html about non standard evaluation in dplyr but still can't get things to work.

plot_column <- "columnA"

raw_data %>%
    group_by(.dots = plot_column) %>%
    summarise (percentage = mean(columnB)) %>%
    filter(percentage > 0) %>%
    arrange(percentage) %>%
    # mutate(!!plot_column := factor(!!plot_column, !!plot_column))%>%
    ggplot() + aes_string(x=plot_column, y="percentage")  +
  geom_bar(stat="identity", width = 0.5) +
  coord_flip()

works fine when the mutate statement is disabled. However, when enabling it in order to order the bars by height only a single bar is returned.

How can I convert the statement above into a function / to use a variable but still plot multiple bars ordered by their size.

An example Dataset could be:

columnA,columnB
a, 1
a, 0.4
a, 0.3
b, 0.5

edit

a sample:

mtcars %>%
  group_by(mpg) %>%
  summarise (mean_col = mean(cyl)) %>%
  filter(mean_col > 0) %>%
  arrange(mean_col) %>%
  mutate(mpg := factor(mpg, mpg))%>%
    ggplot() + aes(x=mpg, y=mean_col)  +
  geom_bar(stat="identity")
  coord_flip()

will output an ordered bar chart. How can I wrap this into a function where the column can be replaced and I get multiple bars?

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

1 Answers1

2

This works with dplyr 0.7.0 and ggplot 2.2.1:

rm(list = ls())
library(ggplot2)
library(dplyr)
raw_data <- tibble(columnA = c("a", "a", "b", "b"), columnB = c(1, 0.4, 0.3, 0.5))

plot_col <- function(df, plot_column, val_column){

  pc <- enquo(plot_column)
  vc <- enquo(val_column)
  pc_name <- quo_name(pc) # generate a name from the enquoted statement!

  df <- df %>%
   group_by(!!pc) %>%
   summarise (percentage = mean(!!vc)) %>%
   filter(percentage > 0) %>%
   arrange(percentage) %>%
   mutate(!!pc_name := factor(!!pc, !!pc)) # insert pc_name here!

  ggplot(df) + aes_(y = ~percentage, x = substitute(plot_column)) +
    geom_bar(stat="identity", width = 0.5) +
    coord_flip()
}
plot_col(raw_data, columnA, columnB)
plot_col(mtcars, mpg, cyl)

Problem I ran into was kind of that ggplot and dplyr use different kinds of non-standard evaluation. I got the answer at this question: Creating a function using ggplot2 .

EDIT: parameterized the value column (e.g. columnB/cyl) and added mtcars example.

friep
  • 321
  • 1
  • 8
  • It does not seem to be 100% there yet. I still get the original result of a single bar instead of one bar per group. – Georg Heiler Aug 03 '17 at 08:24
  • oh sorry, let me check! – friep Aug 03 '17 at 09:03
  • that's weird, i did a clean session of R and it worked for me. I will add this to the code now together with the package versions. – friep Aug 03 '17 at 09:07
  • Thanks. Nearly there now. However for me the ordering of the factors is not yet working properly now. – Georg Heiler Aug 03 '17 at 09:42
  • yeah, it doesn't work yet for me with the mtcars example either. i get a warning and it looks quite messy.. i'll try after my lunch break to implement `reorder`.. – friep Aug 03 '17 at 09:55
  • I found the issue. instead of replacing columnA with its factor, the old version of the code just added a new column "pc" to the data. Of course, then using plot_column in ggplot will not access the factorized version. I fixed it now and added comments where i changed something. I hope it works now. – friep Aug 03 '17 at 11:17