1

Here is a minimal example of my problem.

I would like a vertical line passing through the two peaks in the graph and a label of its corresponding x axis.

    library(dplyr)
    library(ggplot2)

   data.frame(Values=c(1,1,1,2,2,3,3,5,4,5,5,5,5,6,6,6,6,5,5,5,
                        4,4,4,7,8,8,8,8,8,8,8,4,3,2,2,3,5,6,7,4,
                        2,2,3,9,9,9,9,9,9,9,9,9,9,5,1,3,3,1))%>%
    ggplot(aes(x=Values))+
      geom_density()+
      theme_classic()
Ginko-Mitten
  • 304
  • 1
  • 11
  • 1
    My suggestion (not fully worked out): run `density()` on the data: find the peaks using something like the answers from [this question](https://stackoverflow.com/questions/12945113/finding-peaks-in-vector); use `geom_vline()`, `geom_hline()` to add the annotations. – Ben Bolker Jan 25 '21 at 14:50

3 Answers3

4

Adapting the code from an answer I gave to a similar question should also work here. We first extract the layer data from the plot itself to make sure we're not accedentily assuming different bandwidths or kernels than ggplot2. Like the answer Ben Bolker linked, this also makes use of run-length encoding of the differences.

library(ggplot2)

df <- data.frame(Values=c(1,1,1,2,2,3,3,5,4,5,5,5,5,6,6,6,6,5,5,5,
                    4,4,4,7,8,8,8,8,8,8,8,4,3,2,2,3,5,6,7,4,
                    2,2,3,9,9,9,9,9,9,9,9,9,9,5,1,3,3,1))
g <- ggplot(df, aes(x=Values))+
  geom_density()+
  theme_classic()

dens <- layer_data(g, 1)
dens <- dens[order(dens$x),]

# Run length encode the sign of difference
rle <- rle(diff(as.vector(dens$y)) > 0)
# Calculate startpoints of runs
starts <- cumsum(rle$lengths) - rle$lengths + 1
# Take the points where the rle is FALSE (so difference goes from positive to negative) 
maxima_id <- starts[!rle$values]

maxima <- dens[maxima_id,]

g + geom_vline(data = maxima,
               aes(xintercept = x))

Created on 2021-01-25 by the reprex package (v0.3.0)

teunbrand
  • 33,645
  • 4
  • 37
  • 63
3

Building on Ben Bolker's comment:

This code doesn't strictly tell you all the mathematical local maxima, it just tells you points which are the max within a width-20 window whose endpoints don't equal the max.

values <- c(1,1,1,2,2,3,3,5,4,5,5,5,5,6,6,6,6,5,5,5,
                        4,4,4,7,8,8,8,8,8,8,8,4,3,2,2,3,5,6,7,4,
                        2,2,3,9,9,9,9,9,9,9,9,9,9,5,1,3,3,1)

dens <- density(values)

i1 <- 
  zoo::rollapply(dens$y, 20, function(x){
    if(head(x, 1) < max(x) & tail(x, 1) < max(x))
      which.max(x)
    else 
      NA
  })

max_inds <- unique(i1 + seq_along(i1) - 1)
x_at_max <- dens$x[max_inds[!is.na(max_inds)]]


data.frame(Values=c(1,1,1,2,2,3,3,5,4,5,5,5,5,6,6,6,6,5,5,5,
                        4,4,4,7,8,8,8,8,8,8,8,4,3,2,2,3,5,6,7,4,
                        2,2,3,9,9,9,9,9,9,9,9,9,9,5,1,3,3,1))%>%
    ggplot(aes(x=Values))+
      geom_density()+
      theme_classic() +
    geom_vline(xintercept = x_at_max)

enter image description here

Edit:

Actually, looks like this gives the same result as the approach in the answer linked by Ben Bolker. You should probably use that one. I assume it's faster, and also it will give you all the local maximua even if they're not the max of a width-20 window.

r <- rle(dens$y)
which(rep(x = diff(sign(diff(c(-Inf, r$values, -Inf)))) == -2, 
          times = r$lengths))
# [1] 240 378

max_inds[!is.na(max_inds)]
# [1] 240 378
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
0

For a slight variation, this gets the data out of the ggplot2 object, and then first finds the highest value overall, then the highest value for x above 7.

There's definitely more elegant ways to achieve the same.

The main difference here is that you are getting the data out of the ggplot2 object itself, and then you can process it as you like.

library(dplyr)
library(ggplot2)

plot <- data.frame(Values=c(1,1,1,2,2,3,3,5,4,5,5,5,5,6,6,6,6,5,5,5,
                    4,4,4,7,8,8,8,8,8,8,8,4,3,2,2,3,5,6,7,4,
                    2,2,3,9,9,9,9,9,9,9,9,9,9,5,1,3,3,1)) %>% 
  ggplot(aes(x=Values))+
  geom_density()+
  theme_classic() 
  
p <- ggplot_build(plot)


plot +
  geom_vline(xintercept = p$data[[1]]$x[which.max(p$data[[1]]$y)]) +
  geom_vline(xintercept = p$data[[1]]$x[p$data[[1]]$x>7][which.max(p$data[[1]]$y[p$data[[1]]$x>7])]) 

Created on 2021-01-25 by the reprex package (v0.3.0)

giocomai
  • 3,043
  • 21
  • 24