Stop geom_density_ridges from showing non-existent tail values

Question

When I use geom_density_ridges(), the plot often ends up showing long tails of values that don't exist in the data.

Here's an example:

library(tidyverse)
library(ggridges)

data("lincoln_weather")

# Remove all negative values for "Minimum Temperature"
d <- lincoln_weather[lincoln_weather$`Min Temperature [F]`>=0,]

ggplot(d, aes(`Min Temperature [F]`, Month)) +
  geom_density_ridges(rel_min_height=.01)

As you can see, January, February, and December all show negative temperatures, but there are no negative values in the data at all.

Of course, I can add limits to the x-axis, but that doesn't solve the problem because it just truncates the existing erroneous density.

ggplot(d, aes(`Min Temperature [F]`, Month)) +
  geom_density_ridges(rel_min_height=.01) +
  xlim(0,80)

Now the plot makes it look like there are zero values for January and February (there are none). It also makes it look like 0 degrees happened often in December, when in reality there was only 1 such day.

How can I fix this?

Maybe you don't want a density estimate? What exactly are you expecting? Most density estimators assume your data is continuous over all real numbers. They don't expect a bounded range. You would need some kind of different estimator for that. because right now there is nothing to "fix," the statistical method is working as it was designed. — MrFlick, Apr 18 '18 at 18:07
Oh, that makes sense. I guess a histogram would make more sense for a bounded range. This problem arose because I was working with data that can't have negative numbers but does have many zero and near-zero numbers. I suppose a density plot just isn't the right tool to visualize that. — John J., Apr 18 '18 at 18:12
@MrFlick Actually, cutting density estimates at the ends of the data ranges is not that unusual. Violin plots usually do this. The same can be done with `stat_density()`, see [here.](https://stackoverflow.com/a/50011428/4975218) — Claus Wilke, Apr 24 '18 at 22:17

Claus Wilke · Accepted Answer · 2018-04-24T22:20:24.310

One option is to use stat_density() instead of stat_density_ridges(). There are some things that stat_density() can't do, such as drawing vertical lines or overlaying points, but on the flip side it can do some things that stat_density_ridges() can't do, such as trimming the distributions to the data ranges.

# Remove all negative values for "Minimum Temperature"
d <- lincoln_weather[lincoln_weather$`Min Temperature [F]`>=0,]

ggplot(d, aes(`Min Temperature [F]`, Month, group = Month, height = ..density..)) +
  geom_density_ridges(stat = "density", trim = TRUE)

As an alternative, you could draw a point rug, maybe that serves your purpose as well or better:

ggplot(d, aes(`Min Temperature [F]`, Month)) +
  geom_density_ridges(rel_min_height = 0.01, jittered_points = TRUE,
                      position = position_points_jitter(width = 0.5, height = 0),
                      point_shape = "|", point_size = 2,
                      alpha = 0.7)

Note: those two approaches cannot currently be combined, that would require some modifications to the stat code.

You can combine the two approaches adding new point layers with new aesthetics: `... + geom_points(aes(\`Min Temperature [F]\`, Month), inherit.aes = F, ...)` — yuk, May 29 '22 at 16:51

score 9 · Answer 2 · answered Apr 18 '18 at 18:52

Well, turns out I should have just read the documentation more closely. The key part is:

"The ggridges package provides two main geoms, geom_ridgeline and geom_density_ridges. The former takes height values directly to draw ridgelines, and the latter first estimates data densities and then draws those using ridgelines."

There are multiple ways to handle this issue. Here is one:

ggplot(d, aes(`Min Temperature [F]`, Month, height=..density..)) +
  geom_density_ridges(stat = "binline", binwidth=1,
                      draw_baseline = F)

Stop geom_density_ridges from showing non-existent tail values

2 Answers2

Linked