writing R function with ggplot

Question

I have to plot multiple datasets in the same format, and after copy-pasting the code several times, I decided to write a function. I understand simple function in R, and managed to write the following:

testplot <- function(data, mapping){
output <- ggplot(data) +
  geom_bar(mapping,
           stat="identity", 
             position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))

this works fine, however, my plot is more complicated and requires the "data" argument to go separately into each component:

output <- ggplot() +
  geom_bar(df1, mapping,
           stat="identity", 
             position='stack')+
geom_errorbar(df1, ...)+
geom+bar(df2, mapping,
...+
geom_errorbar(df2, ...)

but when I write the function and try to run it as

output <- ggplot() +
  geom_bar(data, mapping,
           stat="identity", 
             position='stack')
}
p <- testplot(df, aes(x=xvar, y=yvar, fill=type))

it gives me an error:

Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class uneval Did you accidentally pass `aes()` to the `data` argument?

Is there a way around it?

EDIT: when I try to include 2 dataframes like this:

testplot <- function(data, data2, mapping){
output <- ggplot() +
 geom_bar(data=data, mapping=mapping,
           stat="identity", 
             position='stack',
           width = barwidth)+
geom_bar(data2=data2, mapping=mapping,
           stat="identity", 
             position='stack',
           width = barwidth)
}



p <- testplot(data=df, data2=df2, mapping=aes(x=norms_number, y=coeff.BLDRT, fill=type))

it says "Ignoring unknown parameters: data2"

I'm confused. Which version of `testplot` are you actually running? Where have you defined `df` here? I don't seen how `output ` is related to `p` here. — MrFlick, Feb 11 '20 at 20:51
`data2` is not a parameter, only `data`, thus your second line should be `geom_bar(data=data2, mapping = ...` See the rest of my edit for a full explanation on this. — Justin Landis, Feb 12 '20 at 22:34

Justin Landis · Answer 1 · 2020-02-12T22:28:57.267

Most of the first arguments to the ggplot2 layer functions are reserved for the mapping argument, which is from aes. So in your function definition you have a dataframe "data" being implicitly assigned to the mapping variable. To get around this, explicitly assign data = data in your function definitions.

for example

output <- ggplot() +
 geom_bar(data = data, mapping = mapping,
          stat="identity", 
            position='stack')
}

EDIT:

There are many ways to do this and it really depends on how complex you want your function to be. If you are gonna stick to a global aesthetic mapping, then you can leave the mapping in the main ggplot call and assign data = NULL, then specify which data frame will be associated with which layer. Consider the following reproducible example

library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10))
plot_custom_ggplot <- function(df1, df2, mapping) {
    ggplot(data = NULL, mapping = mapping) +
      geom_point(data = df1, color = "blue") +
      geom_line(data = df2, color = "red")
}

plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))

In this example, the mapping variable for each of the geom_* layer functions are left blank and instead the mapping is inherited from the main ggplot call.

This is usually how each layer function knows what data to use, because generally it is inherited in the main ggplot function. Whenever you specify a data argument or a mapping argument, you are generally overriding the inherited values. Any missing required aes mappings are attempted to be found in the main call.

library(ggplot2)
data1 <- data.frame(v1=rnorm(10, 50, 20), v2=rnorm(10,30,5))
data2 <- data.frame(v1=rnorm(10, 100, 20), v2=rnorm(10,50,10), z = c("A","B"))
plot_custom_ggplot <- function(df1, df2, mapping) {
    ggplot(data = NULL, mapping = mapping) +
      geom_point(data = df1, color = "blue") +
      geom_line(data = df2, mapping = aes(color = z)) #inherits x and y mapping from main ggplot call.
}

plot_custom_ggplot(data1,data2, aes(x = v1,y = v2))

But adding additional aes mappings is risky if you are also specifying data. This is because you data variable may not always contain the correct columns.

plot_custom_ggplot(df1 = data2, df2 = data1, aes(x = v1, y = v2))
#Error in FUN(X[[i]], ...) : object 'z' not found
#
#the column z is not present in data1 object - 
#R then looked globally for a z object and didnt find anything.

I believe it is best practices to use tidy data when working with ggplot because things become so much easier. There is usually no reason to use multiple data frames. Especially if you plan to use one set of mapping for all data frames. A good exception is if you are writing a plotting function for a custom R object, in which you know how it is defined.

Otherwise, consider and compare how these two functions work in this example:

data1 <- data.frame(v1=rnorm(20, 50, 20), v2=rnorm(20,30,5), letters= letters[1:20], id = "df1")
data2 <- data.frame(v1=rnorm(20, 100, 20), v2=rnorm(20,50,10), letters = letters[17:26], id = "df2")

set.seed(76)
plot_custom_ggplot2 <- function(df, mapping) {
  ggplot(data = df, mapping = mapping) +
    geom_bar(stat = "identity",
             position="stack") 
}

plot_custom_ggplot <- function(df1, df2, mapping) {
  ggplot(data = NULL, mapping = mapping) +
    geom_bar(data = df1, stat = "identity",
         position="stack") +
    geom_bar(data = df2, stat = "identity",
         position="stack")
}

plot_custom_ggplot(data1,data2, aes(x = letters,y = v2, fill = id))
plot_custom_ggplot2(rbind(data1,data2), aes(x = letters, y = v2, fill = id))

In the first plot, the red bars for q, r, s, and t are hidden behind the blue bars. This is because they are added on top of each other as layers. In the second plot, these values actually stack because these values were added together in a single layer rather than two separate ones.

I hope this gives you enough information to write your ggplot function.

that works in principle but what do I call the second dataframe? I'll edit the post to account for it — Agata, Feb 12 '20 at 12:00

score 0 · Answer 2 · answered Feb 12 '20 at 14:38

0

library(tidyverse)

testplot <- function(df1, df2, mapping){

  a <- ggplot() + 
    geom_point(data = df1, mapping = mapping) +
    geom_point(data = df2, mapping = mapping)

  return(a)

}

mtcars2 <- mtcars / 100 # creating a separate dataframe to provide the function

testplot(mtcars, mtcars2, mapping = aes(x = drat, y = vs))

From your example you have "data2=data2" - geom_bar doesn't have an argument 'data2', only data. I got the above to work, so an adaptation for your purposes should work too!

answered Feb 12 '20 at 14:38

Jack

173
8

oh this makes sense! it works with geom_bar, but for some reason doesn't work the same way with geom_errorbar (keeps saying "can't add ggproto object"). Any idea what that means? – Agata Feb 12 '20 at 17:26
Can you provide a reproducable example? And are you sure that you're adding everything to a ggplot()? (No missing plus sign) – Jack Feb 13 '20 at 18:22

score 0 · Answer 3 · answered Feb 16 '20 at 15:47

The reason I split my dataframe was because I wanted a grouped and stacked plot, and used this question: How to plot a Stacked and grouped bar chart in ggplot?

The mapping has to be different so that they don't end up on top on each other (so it's x=var1, and then x=var1+barwidth)

Anyway, I can make a plot with multiple geom_bar, but it's the subsequent geom_errorbar that doesn't work in a single function. I just added the error bars separately in the end, and maybe I'll look into the other options some other time.

I realise these are already functions so probably not meant to be used this way, and maybe that's why I can't do multiple geom_errorbar in one function. I just wanted my code to be more readable because I had to plot the same thing 12 times, with very minor differences and it was very long. Perhaps there is a more elegant way to do it though.

writing R function with ggplot

3 Answers3