How to write a function for plotting with ggplot which works with generic data frames and variables

Question

I often have to make plots which are essentially the same plot, but only for different variables and/or data frames:

p <- ggplot(data = data1, aes(x = variable1, y = ..density..)) +
    geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = groupvar1)) + 
    geom_density(size = 1, aes(color = groupvar1))

p <- ggplot(data = data1, aes(x = variable2, y = ..density..)) +
    geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = group2)) + 
    geom_density(size = 1, aes(color = group2))

p <- ggplot(data = data2, aes(x = variable3, y = ..density..)) +
    geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = group3)) + 
    geom_density(size = 1, aes(color = group3))

  .
  .
  .

and so on. Instead than duplicating nearly identical code multiple times, I would like to write a single function which I can use with different data frames, variables to be plotted and with or without a grouping variable. Something like:

my_data <- data.frame(y = rnorm(100,0,1),z = runif(100,0,1), 
                      group1 = rep(c("A","B"), each =50), 
                      group2 = as.factor(rep(1:4, each =25)))

variable_distribution <- function(dataframe, myvar, groupvar = NULL) {
    p <- ggplot(data = dataframe, aes(x = myvar, y = ..density..)) 
    if (is.null(groupvar)) {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity") + 
            geom_density(size = 1)
    }
    else {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = groupvar)) + 
            geom_density(size = 1, aes(color = groupvar))
    }
    print(p)
}

Some results:

variable_distribution(my_data, my_data$y, my_data$group1)

variable_distribution(my_data, my_data$z, my_data$group2)

There are several issues with my code:

The labels are not what I would like them to be. In the first call, I would like the x-label to be y, instead than myvar, and the legend title to be group1, instead than groupvar. In the second call, the x-label should be z and the legend title group2.
y and group1 are parts of my_data, it seems a bit redundant to pass them as two vectors, "separated" from my_data.

PS I don't want to address the variables by column number, because that makes the code much less readable. I'd like an interface such as

variable_distribution(my_data,y, group1)

or

variable_distribution("my_data", "y", "group1")

Or something like that...

EDIT: the solution in the linked questions just doesn't work, as someone might have noticed if he/she had actually tried to answer the question instead than concentrating on which question this one should be a duplicate of. Look:

variable_distribution <- function(dataframe, x_string, group_string = NULL) {
    p <- ggplot(data = dataframe, aes_string(x = x_string, y = ..density..)) 
    if (is.null(group_string)) {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity") + 
            geom_density(size = 1)
    }
    else {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = group_string)) + 
            geom_density(size = 1, aes(color = group_string))
    }
    print(p)
}

variable_distribution(my_data, "y", "group1")
>Error in aes_string(x = x_string, y = ..density..) : 
  object '..density..' not found

Your current function can fail catastrophically with e.g. facets, and without warning! Please map your variables properly (with `aes_` or `aes_string`). — Axeman, Mar 16 '17 at 10:00
@Axeman, as a matter of fact I do need to use facets in the actual code: here I removed `facet_wrap` to simplify the question. I tried to read about `aes_`, but the help of `ggplot2` is not clear enough for me: I don't understand what should I pass to my function as argument `myvar` , if I used `aes_` instead than `aes`. What about writing an answer :)? Otherwise I'll go for `aes_string`, but the help of the two functions says that `aes_` should be preferred... — DeltaIV, Mar 16 '17 at 10:09
@Axeman the question you linked to explicitly asks to pass column indices, while I explicitly said I do **not** want to pass column indices. — DeltaIV, Mar 16 '17 at 10:15
@Axeman, ah, but I see that Paul Hiemstra's answer does not use column indices. Ok. However, I would have also liked to see an answer with `aes_`, since ggplot help says it's better than `aes_string` (I don't understand why: something related to "non standard evaluation", which I don't know) — DeltaIV, Mar 16 '17 at 10:17
Use something like `f <- function(d, v, g) { ggplot(d, aes_(x = substitute(v), y = ~..density.., fill = substitute(g))) }; f(my_data, y, group1)`. — Axeman, Mar 16 '17 at 10:26
`aes_string` doesn't work! I ask that the question be reopened because the linked answer uses `aes_string`, and it doesn't work. — DeltaIV, Mar 16 '17 at 10:33
`aes_string` does work once you understand that when you use it that all variable names need to be in quotes (see the help page). So you would need `y = "..density.."`. Don't forget to also use `aes_string` when mapping fill/color later in your function. Certainly `aes_`, which replaces `aes_q` as outlined in one of the answers in the duplicates, is another option. — aosmith, Mar 17 '17 at 15:19

How to write a function for plotting with ggplot which works with generic data frames and variables

0 Answers0