In this case I think it's best to do the filter
ing inside the individual geom layers since they're all different subsets of the same data source. Here are a couple of options on how to do this. I think option 1 is much cleaner code.
Option 1
If you look at the documentation for any geom_*()
function you'll see that there are actually 3 options for what to provide as data
.
If NULL
, the default, the data is inherited from the plot data as
specified in the call to ggplot()
.
A data.frame
, or other object, will override the plot data. All
objects will be fortified to produce a data frame. See fortify()
for
which variables will be created.
A function
will be called with a single argument, the plot data. The
return value must be a data.frame
, and will be used as the layer data.
A function
can be created from a formula (e.g. ~ head(.x, 10)
).
This last option can be used here to perform additional manipulation/filtering on your data prior to using it in a particular geom_*()
layer.
library(tidyverse)
# give function as data
mtcars %>%
mutate(newcol = cyl * wt) %>%
rownames_to_column("car") %>%
ggplot() +
geom_point(data = ~filter(.x, cyl > 4 & qsec < 17),
aes(x = hp, y = mpg), color = "red") +
geom_text(data = ~filter(.x, newcol < 10 | disp < 90),
aes(x = hp, y = mpg, label = car))

Created on 2022-02-20 by the reprex package (v2.0.1)
Option 2
Second, you could use capture the output of {magrittr} pipe (%>%
) as .
and filter inside the geom_*()
's data
argument. In order to prevent the output of %>%
going in as the first argument you need to embrace the ggplot()
call in curly braces {}
and then also wrap the pipe output in curly braces, like this: {.}
. In some cases, it will work fine without this treatment since data
can be the first argument, but not always depending on how you construct this. Therefore it's safest to use the {}
approach.
This somewhat unintuitive behaviour of the {magrittr} pipe is lightly documented here.
There's also a nice explanation of it in this answer.
You can combine multiple conditions in the filter
operation by connecting then with logical OR (|
) or AND (&
) operators.
library(tidyverse)
# works with or without {}
mtcars %>%
mutate(newcol = cyl * wt) %>%
rownames_to_column("car") %>%
{
ggplot() +
geom_point(data = {.} %>% filter(cyl > 4 & qsec < 17),
aes(x = hp, y = mpg), color = "red") +
geom_text(data = {.} %>% filter(newcol < 10 | disp < 90),
aes(x = hp, y = mpg, label = car))
}

# error without {}
mtcars %>%
mutate(newcol = cyl * wt) %>%
rownames_to_column("car") %>%
ggplot() +
geom_point(data = filter(., cyl > 4 & qsec < 17),
aes(x = hp, y = mpg), color = "red") +
geom_text(data = filter(., newcol < 10 | disp < 90),
aes(x = hp, y = mpg, label = car))
#> Error in filter(., cyl > 4 & qsec < 17): object '.' not found
Created on 2022-02-20 by the reprex package (v2.0.1)