stat_ecdf()
is a good starting point for this visualization but there are a few modifications we need to make.
- In a CDF,
y
represents the probability density of values less than a given value of x
. Since you're looking for the density of values greater than x
, we can instead invert the output. For this we make use of the special internal variables computed by ggplot()
. These used to be accessed through ..
or stat()
nomenclature (e.g. ..y..
or stat(y)
). Now the preferred nomenclature is after_stat()
(also described in this and this blog posts). So the final code specifies this inversion inside the aes()
of stat_ecdf()
by setting y = 1 - after_stat(y)
meaning, "once you've calculated the y value with the stat
, subtract that value from 1 before returning for plotting".
- You want to see actual count rather than probability density. For this, one easy option is to use a second axis where you specify this transformation by simply multiplying by the number of observations. To facilitate this I calculate this value outside of the
ggplot()
call because it's cumbersome to access this value within ggplot
.
- Since you are asking for a value of
y
that is the count of observations with a value greater than or equal to x
, we need to shift the default output of stat_ecdf()
. Here, I do this by simply specifying aes(carb + 1)
. I show both versions below for comparison.
Note: I'm showing the points with the line to help illustrate the actual y
value at each x
since the geom = "step"
(the default geom
of stat_ecdf()
) somewhat obscures it.
library(tidyverse)
n <- nrow(mtcars)
mtcars %>%
ggplot(aes(carb)) +
stat_ecdf(aes(y = (1 - after_stat(y))), geom = "point") +
stat_ecdf(aes(y = (1 - after_stat(y))), geom = "step") +
scale_y_continuous("Density", position = "right",
sec.axis = sec_axis(name = "Count", trans = ~.x*n)) +
scale_x_continuous(limits = c(0, NA), breaks = 0:8) +
ggtitle("y = count with carb > x")

mtcars %>%
ggplot(aes(carb + 1)) +
stat_ecdf(aes(y = (1 - after_stat(y))), geom = "point") +
stat_ecdf(aes(y = (1 - after_stat(y))), geom = "step") +
scale_y_continuous("Density", position = "right",
sec.axis = sec_axis(name = "Count", trans = ~.x*n)) +
scale_x_continuous(limits = c(0, NA), breaks = 0:9) +
ggtitle("y = count with carb >= x")

Created on 2022-09-30 by the reprex package (v2.0.1)