10

Is there a method of filtering within ggplot itself? That is, say I want to do this

p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, species)) +
     geom_point(size = 4, shape = 4) +
     geom_point(size = 1, shape = 5 # do this only for data that meets some condition. E.g. Species == "setosa") 

I know there are hacks I can use like setting the size = 0 if Species != "setosa" or resetting the data like shown below, but there's all hacks.

p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, species)) +
     geom_point(size = 4, shape = 4) +
     geom_point(data = iris %>% filter(Species == "setosa"), colour = "red") +
     geom_point(data = iris %>% filter(Species == "versicolor"), shape = 5)

Basically, i have a chart where certain things should be displayed only if a certain criteria is met, and right now, I'm using the hack above to accomplish this and it's keeping me up at night, my soul slowly dying from the mess I've created. Needless to say, any help would be very much appreciated!

Edit

I'm afraid my example may have been too simplistic. Basically, given ggplot(data = ...), how do I add these layers, all using the data bound to the ggplot obj:

  1. Plot curves
  2. Plot dots on points that meet criteria #1. These dots would be in red. Points that don't meet the criteria don't get a point drawn (Not a hack like point size set to zero, or alpha set to 0)
  3. Add labels to points that meet criteria #2.

Critera #1 and #2 could be anything. E.g. label only outlier points. Draw in red only those points which are outside a specific range, etc.

I don't want to

  1. bind a new dataset ala ggplot(data=subset(iris, Species=="setosa"),...) or ggplot(data=filter(iris,Species=="setosa").
  2. use a scaling hack (like setting scale=manual and whatever doesn't meet the criteria gets a NULL/NA, etc). For example, if I had 1000 points and only 1 point met a given criteria, I want it to only apply it's plotting logic to that one point instead of looking at, and styling all 1000 points
adilapapaya
  • 4,765
  • 3
  • 25
  • 26
  • 2
    The typical choice is usually to make your condition an *aesthetic* within the layer, possibly while setting the scales yourself. E.g. `geom_point(aes(colour = Species == "setosa")) + scale_color_manual(values = c("black", "red"))`. – David Robinson Mar 04 '16 at 21:29
  • 3
    An alternative could be to use a subset of your data, like `geom_point(data=subset(iris, Species=="setosa"), size = 1, shape = 5)`. – lukeA Mar 04 '16 at 21:41
  • @lukeA how is the `subset` solution different from the `filter` solution? – adilapapaya Mar 04 '16 at 22:26
  • @DavidRobinson please see my edit. This still looks like a hack in the sense that I'm not telling ggplot to apply a specific thing to only the data that meets a certain criteria but rather I'm just dividing the data into two groups and styling them differently. – adilapapaya Mar 04 '16 at 22:28
  • 2
    What you call hacks may basically be the way to go. Why would you need other options if it works well? :) – lukeA Mar 04 '16 at 22:34
  • Yes- in my understanding the only way to do this is doing data = with various subsets. – David Robinson Mar 04 '16 at 22:35
  • ok, thanks for the input~! :) – adilapapaya Mar 05 '16 at 00:49

2 Answers2

21

apparently layers now accept a function as data argument, so you could use that

pick <- function(condition){
  function(d) d %>% filter_(condition)
}

ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, species)) +
  geom_point(size = 4, shape = 4) +
  geom_point(data = pick(~Species == "setosa"), colour = "red") +
  geom_point(data = pick(~Species == "versicolor"), shape = 5)
baptiste
  • 75,767
  • 19
  • 198
  • 294
  • Perfect. Exactly what I was looking for. Thanks baptiste! – adilapapaya Mar 05 '16 at 15:52
  • Very good answer, especially since ggplot2 v2+ does no longer support the unofficial (ie undocumented) `subset` argument to filter data for each layer. See https://github.com/hadley/ggplot2/issues/1498. – Holger Brandl Apr 21 '16 at 10:06
  • 6
    Given that filter_ is deprecated, it may be necessary to do something like the following in the future: `pick <- function(condition){ function(d) d %>% filter(!!enquo(condition)) }`. If this is idiom is used, the tilde must be removed from the references to columns within calls to `pick`. – Steve Walker Jan 14 '19 at 15:27
  • There is a way to do this without the `pick` routine? – Cristóbal Alcázar Jun 24 '21 at 20:08
7

You can filter data with an anonymous function using the ~ formula notation:

library(ggplot2)
library(dplyr)

ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, species)) +
    geom_point(size = 4, shape = 4) +
    geom_point(data = ~filter(.x, Species == "setosa"), colour = "red") +
    geom_point(data = ~filter(.x, Species == "versicolor"), shape = 5)

Created on 2021-11-15 by the reprex package (v2.0.0)

JohannesNE
  • 1,343
  • 9
  • 14