2

I apologize if this question has been asked already. I'm a beginner to R and do not have an advanced stats background. I am trying to determine the number of peaks (maximums) for my data in R. For those familiar with GraphPad Prism, essentially I am trying to find the "Number of Peaks" by doing an Area Under the Curve analysis and using a threshold of y=2. My dataframe is below (called example1).

time a  b   c   d   e   f
0   248 413 473 370 501 235
3   243 408 468 356 496 237
6   243 406 476 354 503 235
9   243 412 469 352 500 232
12  245 411 467 355 504 235
15  241 420 483 348 499 227
18  239 416 477 344 493 230
21  236 413 478 338 490 227
24  234 406 475 337 481 229

My x variable would be the first column and my y variable would be the rest of the columns (very large dataset- 50+ dependent variables). For each dependent variable or column, I am trying to find the number of peaks (local maxima). I need to make my y threshold = 2.

I have already plotted my data (code below) using ggplot by melting my dataframe.

#Melt data
melted <- melt(example1, id.vars="time")

#Create line graph
ggplot(data=melted, aes(x=time, y=value, group=variable)) + 
  geom_line(show.legend = TRUE))

How can I calculate and output the number of peaks per column (with the y=2 threshold)? Will I need to do an area under the curve analysis? I'm hoping to calculate something like this (number of peaks per column)... a = 0, b = 3, c = 0, d = 6, e = 1, f = 0 but the output could be something like 0, 3, 0, 6, 1, 0

I do not need to produce another graph. I just need an output of number of peaks per dependent variable.

tshast2
  • 57
  • 6
  • 1
    Could you clarify what do you mean by peak. Are you looking for local maxima or global one ? To clarify your point maybe this link will help you : http://mathonline.wikidot.com/local-maxima-and-minima-and-absolute-maxima-and-minima. – Rémi Coulaud Jun 30 '20 at 18:04
  • My apologies- I meant local maxima. Thank you – tshast2 Jun 30 '20 at 22:25
  • Does this answer your question? [Finding local maxima and minima](https://stackoverflow.com/questions/6836409/finding-local-maxima-and-minima) – Rémi Coulaud Jul 01 '20 at 06:58

2 Answers2

3

There is a findpeaks() function available through the pracma package that is exceptionally useful for this type of thing. See documentation here. You can specify the threshold or go with default settings. There are also some parameters to help ignore or include peaks that span multiple points.

You feed findpeaks() the time-series vector (meaning make sure that it is ordered by your x axis first), and it will output a matrix where the number of rows corresponds to the number of peaks, and for each peak you get maxima (y value), index, beginning index, and end index. See the utilization below with your example1 dataset:

peak_info <- lapply(example1[,2:7], findpeaks, threshold=2)

> peak_info
$a
     [,1] [,2] [,3] [,4]
[1,]  245    5    4    9

$b
     [,1] [,2] [,3] [,4]
[1,]  420    6    5    9

$c
     [,1] [,2] [,3] [,4]
[1,]  476    3    2    5
[2,]  483    6    5    7

$d
     [,1] [,2] [,3] [,4]
[1,]  355    5    4    9

$e
     [,1] [,2] [,3] [,4]
[1,]  503    3    2    4
[2,]  504    5    4    9

$f
     [,1] [,2] [,3] [,4]
[1,]  237    2    1    4
[2,]  235    5    4    6
[3,]  230    7    6    8

If you just want to know the number of peaks, you can run the following:

> unlist(lapply(peak_info, nrow))

a b c d e f 
1 1 2 1 2 3 
chemdork123
  • 12,369
  • 2
  • 16
  • 32
  • unlist(lapply) shouts out very loudly "sapply" or "vapply" to me... interesting package! +1 – tjebo Jun 30 '20 at 19:18
  • it does doesn't it :) – chemdork123 Jun 30 '20 at 21:27
  • When running `unlist(lapply(peak_info, nrow))` is there a way for it to output columns with a value of 0 as well? Right now, it is only giving me the values for columns that do have peaks. In addition, is there a way to replace the value of "7" in this line of code `peak_info <- lapply(example1[,2:7], findpeaks, threshold=2)` with a value that is unspecified, but tells the function to use all columns except the first one (I have multiple large datasets of differing size). Thank you! – tshast2 Jun 30 '20 at 22:29
  • Like I said, I would play around with the settings in the `findpeaks()` function. Do some searching and you'll find some examples that should help. For the second part, sure that line would work for datasets with any number of columns (and for which you want to ignore the first column if you change it to: `lapply(example1[,2:ncol(example1)], findpeaks, threshold=2)`... but as Tjebo said, probably should have used `sapply()` or `vapply()`. – chemdork123 Jul 01 '20 at 11:58
0

If it is local maxima or minima that you are looking for you may have a look to this post.

If it is that you are looking for so you only have to do it for each column :

df <- data.frame(var1 = c(1,2,3,2,1, 1, 2, 1),
                 var2 = c(1,2,3,2,1, 1, 1, 1),
                 var3 = c(1,2,3,2,1, 5, 1, 2))

res_list <- apply(df, 2, function(x){
    return(which(diff(sign(diff(x)))==-2)+1)
})

res_list :
$var1
[1] 3 7

$var2
[1] 3

$var3
[1] 3 6

It doesn't work for extrema on the edge of your domain.

Rémi Coulaud
  • 1,684
  • 1
  • 8
  • 19