1

I would like to add frequencies on a second y-axis to a ridgeline plot using ggplot2 and ggridges

I found a tutorial adding the frequencies as numbers with geom_text (https://rdrr.io/cran/ggridges/man/stat_binline.html), however, I would prefer to add them as a second y-axis.

Of course I very appreciate solutions outside ggridges to get a similar plot.

Example data:

library(ggplot2)
library(ggridges)
library(lubridate)

# datapoints
data_timepoint <- data.frame(type=factor(c("A","B","C","D")),
                             start=as.Date(c("1990-01-01","2000-01-01","2010-01-01","2012-01-01")),
                             stop=as.Date(c(rep("2022-01-01",4))))

                             
                             
# frequencies                             
data_freq <- data.frame(type=c("A","A","B","C","D","D","D"),
                        year=ymd(year(as.Date(c("1991-01-01","1991-01-01","2005-01-01","2016-01-01","2013-01-01","2013-01-01","2015-01-01"))),truncated=2L))
                                 




# plot
ggplot(data_timepoint) +
  geom_rect(aes(xmin=start, xmax=stop,
                ymin=type, ymax=as.numeric(type)+0.9), fill="lightblue") +
  geom_density_ridges(data=data_freq, aes(x=year,y=type),stat = "binline",
                      bins = 1, scale = 0.95, draw_baseline = FALSE, alpha=.5, binwidth=10,center=20) +
  scale_x_date(date_breaks = "1 year",date_labels = "%Y") +
    theme(axis.text.x = element_text(angle = 90),
        axis.text.y = element_text(vjust = -2)) +
  labs(title="",y="Type",x="Year")

Created on 2022-06-03 by the reprex package (v2.0.1)

Desired output: enter image description here

ava
  • 840
  • 5
  • 19
  • you have so few data points, I can hardly see the usefulness of a density plot... - or maybe you have way more data points and you want to share sample data that resembles it more closely? (maybe use the example from `?geom_density_ridges`? – tjebo Jun 03 '22 at 08:41
  • thank you. yeah the real data has way more data points. I thought the sample data should be small and I am using a long data format. Therefore I gave sample data that represents my data structure the best. – ava Jun 03 '22 at 08:50
  • related https://stackoverflow.com/questions/6957549/overlaying-histograms-with-ggplot2-in-r and https://stackoverflow.com/questions/37404002/geom-density-to-match-geom-histogram-binwitdh – tjebo Jun 03 '22 at 08:53

1 Answers1

2

You technically don't really have a secondary y axis - you just want to show frequency instead of density. You can generally show frequency by using ..count.. or with newer syntax after_stat(count) as your y aesthetic. ggridges doesn't seem to have count as a computed stat - therefore maybe fake your ggridges look with facets.

The example is adapted from ?geom_density_ridges

library(ggplot2)

## swap x and y
ggplot(diamonds, aes(price)) +
## use y = after_stat(count) to show your frequency
  geom_density(aes(y =after_stat(count))) +
## change the y axis position to the right
  scale_y_continuous(expand = c(0.01, 0), position = "r") +
  scale_x_continuous(expand = c(0.01, 0)) +
## add facet, and put label to the left
  facet_wrap(~cut, ncol = 1, strip.position = "l") 

Created on 2022-06-03 by the reprex package (v2.0.1)

If you go a step further, and let the facets overlap (which is the principle of a ridge plot: overlapping facets of a density plot), you will see that by adding an axis guide to a classic ridge plot, there will be overlap of those guides between the ridges (your facets). This doesn't look good.

This is irrespective of your stat, and will also happen with stat = "binline"

p <- ggplot(diamonds, aes(price)) +
  geom_density(aes(y = ..count..)) +
  scale_y_continuous(expand = c(0.01, 0), position = "r") +
  scale_x_continuous(expand = c(0.01, 0)) +
  facet_wrap(~cut, ncol = 1, strip.position = "l")  +
## let the facets overlap (make background and strip transparent)
  theme(panel.spacing.y = unit(-.3, "in"), 
        strip.background = element_blank(),
        panel.background = element_blank(), 
        panel.grid.major = element_blank())

cowplot::stamp_bad(p)

To add your desired rectangle annotation, your approach is perfectly fine. Is the data actually structured as in your example or have you just created the second frame beforehand based on the first one? (This would be excellent and well done doing so)

Few more comments in the code

   
ggplot() +
## use different y - slightly depending on your desired look 
geom_rect(data = data_timepoint, aes(xmin=start, xmax=stop, ymin=0, ymax=1), fill="lightblue") +
geom_histogram(data= data_freq, aes(year)) +
## added pretty labels
scale_y_continuous(expand = c(0, 0), position = "r", breaks = scales::breaks_pretty(n = 2)) +
## keep x as date
scale_x_date(expand = c(0, 0)) +
facet_wrap(~type, ncol = 1, strip.position = "l") 

enter image description here

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • thank you. I try to get a histogram distribution, therefore I used `stat = "binline"` . probably I can combine `stat_bin` with `geom_rect` to get a plot similar to the desired output (i.e., histogram distribution on a geom). – ava Jun 03 '22 at 09:18
  • thank you. I see your point regarding the classic ridge plot. I try to represent the distribution using histograms not density plots. can your approach be combined with `geom_rect` as shown in the desired output above? – ava Jun 03 '22 at 10:12
  • nob probs. unfortunately I did not accomplish it. yeah it simply highlights the time span (first to last occurrence). – ava Jun 10 '22 at 17:50
  • 1
    @ava I have added a suggestion with your geom_rect. Your approach seemed perfectly fine to me. Not sure if this is what you envisioned. – tjebo Jun 11 '22 at 11:27