2

I often have to make plots which are essentially the same plot, but only for different variables and/or data frames:

p <- ggplot(data = data1, aes(x = variable1, y = ..density..)) +
    geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = groupvar1)) + 
    geom_density(size = 1, aes(color = groupvar1))

p <- ggplot(data = data1, aes(x = variable2, y = ..density..)) +
    geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = group2)) + 
    geom_density(size = 1, aes(color = group2))

p <- ggplot(data = data2, aes(x = variable3, y = ..density..)) +
    geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = group3)) + 
    geom_density(size = 1, aes(color = group3))

  .
  .
  .

and so on. Instead than duplicating nearly identical code multiple times, I would like to write a single function which I can use with different data frames, variables to be plotted and with or without a grouping variable. Something like:

my_data <- data.frame(y = rnorm(100,0,1),z = runif(100,0,1), 
                      group1 = rep(c("A","B"), each =50), 
                      group2 = as.factor(rep(1:4, each =25)))

variable_distribution <- function(dataframe, myvar, groupvar = NULL) {
    p <- ggplot(data = dataframe, aes(x = myvar, y = ..density..)) 
    if (is.null(groupvar)) {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity") + 
            geom_density(size = 1)
    }
    else {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = groupvar)) + 
            geom_density(size = 1, aes(color = groupvar))
    }
    print(p)
}

Some results:

variable_distribution(my_data, my_data$y, my_data$group1)

enter image description here

variable_distribution(my_data, my_data$z, my_data$group2)

enter image description here

There are several issues with my code:

  1. The labels are not what I would like them to be. In the first call, I would like the x-label to be y, instead than myvar, and the legend title to be group1, instead than groupvar. In the second call, the x-label should be z and the legend title group2.
  2. y and group1 are parts of my_data, it seems a bit redundant to pass them as two vectors, "separated" from my_data.

PS I don't want to address the variables by column number, because that makes the code much less readable. I'd like an interface such as

variable_distribution(my_data,y, group1)

or

variable_distribution("my_data", "y", "group1")

Or something like that...

EDIT: the solution in the linked questions just doesn't work, as someone might have noticed if he/she had actually tried to answer the question instead than concentrating on which question this one should be a duplicate of. Look:

variable_distribution <- function(dataframe, x_string, group_string = NULL) {
    p <- ggplot(data = dataframe, aes_string(x = x_string, y = ..density..)) 
    if (is.null(group_string)) {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity") + 
            geom_density(size = 1)
    }
    else {
        p <- p + geom_histogram(bins = 15, alpha = 0.2, position = "identity", aes(fill = group_string)) + 
            geom_density(size = 1, aes(color = group_string))
    }
    print(p)
}

variable_distribution(my_data, "y", "group1")
>Error in aes_string(x = x_string, y = ..density..) : 
  object '..density..' not found
DeltaIV
  • 4,773
  • 12
  • 39
  • 86
  • 1
    Read about aes_string. – zx8754 Mar 16 '17 at 09:58
  • 1
    Your current function can fail catastrophically with e.g. facets, and without warning! Please map your variables properly (with `aes_` or `aes_string`). – Axeman Mar 16 '17 at 10:00
  • @Axeman, as a matter of fact I do need to use facets in the actual code: here I removed `facet_wrap` to simplify the question. I tried to read about `aes_`, but the help of `ggplot2` is not clear enough for me: I don't understand what should I pass to my function as argument `myvar` , if I used `aes_` instead than `aes`. What about writing an answer :)? Otherwise I'll go for `aes_string`, but the help of the two functions says that `aes_` should be preferred... – DeltaIV Mar 16 '17 at 10:09
  • @Axeman the question you linked to explicitly asks to pass column indices, while I explicitly said I do **not** want to pass column indices. – DeltaIV Mar 16 '17 at 10:15
  • @Axeman, ah, but I see that Paul Hiemstra's answer does not use column indices. Ok. However, I would have also liked to see an answer with `aes_`, since ggplot help says it's better than `aes_string` (I don't understand why: something related to "non standard evaluation", which I don't know) – DeltaIV Mar 16 '17 at 10:17
  • Use something like `f <- function(d, v, g) { ggplot(d, aes_(x = substitute(v), y = ~..density.., fill = substitute(g))) }; f(my_data, y, group1)`. – Axeman Mar 16 '17 at 10:26
  • `aes_string` doesn't work! I ask that the question be reopened because the linked answer uses `aes_string`, and it doesn't work. – DeltaIV Mar 16 '17 at 10:33
  • `aes_string` does work once you understand that when you use it that all variable names need to be in quotes (see the help page). So you would need `y = "..density.."`. Don't forget to also use `aes_string` when mapping fill/color later in your function. Certainly `aes_`, which replaces `aes_q` as outlined in one of the answers in the duplicates, is another option. – aosmith Mar 17 '17 at 15:19

0 Answers0