1

I have a a data frame with values for a particular time span. I have located local maxima using function find_peaks. They are marked as TRUE in a column named peak:

test <- 
structure(list(year = 1996:2016, value = c(-0.5214506, -0.8037488, 
    0.1138524, 0.9939848, 1.7027944, 0.6448417, 0.1204489, -1.2254546, 
    -0.6733273, -0.7457323, 0.4874829, 2.2080809, 2.0609055, -2.5291374, 
    -1.5272201, 0.3057773, 0.1383523, -0.6455441, -0.8364883, -0.8907073, 
    -0.7940878), peak = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -21L))   

test
# A tibble: 21 x 3
    year  value peak 
   <int>  <dbl> <lgl>
 1  1996 -0.521 FALSE
 2  1997 -0.804 FALSE
 3  1998  0.114 FALSE
 4  1999  0.994 FALSE
 5  2000  1.70  TRUE 
 6  2001  0.645 FALSE
 7  2002  0.120 FALSE
 8  2003 -1.23  FALSE
 9  2004 -0.673 FALSE
10  2005 -0.746 FALSE
11  2006  0.487 FALSE
12  2007  2.21  TRUE 
13  2008  2.06  FALSE
14  2009 -2.53  FALSE
15  2010 -1.53  FALSE
16  2011  0.306 FALSE
17  2012  0.138 FALSE
18  2013 -0.646 FALSE
19  2014 -0.836 FALSE
20  2015 -0.891 FALSE
21  2016 -0.794 FALSE

I have to find consecutive non-negative values that precede located peaks (+ the peak). There are 2 peaks in this example, but there can be more. Result should look like:

# A tibble: 5 x 3
   year value peak 
  <int> <dbl> <lgl>
1  1998 0.114 FALSE
2  1999 0.994 FALSE
3  2000 1.70  TRUE 
4  2006 0.487 FALSE
5  2007 2.21  TRUE 

I tried some things, but I could not find the way to solve this. Any help would be appreciated.

Miha Trošt
  • 2,002
  • 22
  • 25

2 Answers2

3

This should work

#iterate over the rows of the table
for(i in 1:nrow(test)){

  #set some objects that will be used in the loop, you can define
  #them outside the loop too
  if(i == 1){
    #this is for the while loop
    k <- FALSE
    #where we put each wanted row of the table
    outList <- list()
    #a counter of the previous list
    j <- 0
  }

  #if the row contains a peak
  if(unname(unlist(test[i, 'peak']))){
    #update the list counter
    j <- j + 1
    #put the row in the list
    outList[[j]] <- test[i,]
    #update k to iterate backwards
    k <- TRUE
    m <- i
    while(k){
      #go one row behind to see if it is positive
      m <- m -1
      #if its positive put it in the list
      if(unname(unlist(test[m, 'value'])) > 0){
        j <- j + 1
        outList[[j]] <- test[m, ]
      #if its not positive stop the while loop
      }else{
        k <- FALSE
      }
    }

  }
}
#join all the rows together
do.call('rbind', outList)

The only problem is that the order in the output is not in the order that you wrote on your question. I am not sure how important is that.

 A tibble: 5 x 3
   year value peak 
  <int> <dbl> <lgl>
1  2000 1.70  TRUE 
2  1999 0.994 FALSE
3  1998 0.114 FALSE
4  2007 2.21  TRUE 
5  2006 0.487 FALSE
Marc P
  • 353
  • 1
  • 17
  • Order is not important, because I can always sort it by year. – Miha Trošt Jun 01 '18 at 12:21
  • If the answer works for you, please consider accepting it. – Marc P Jun 01 '18 at 12:24
  • I am testing your solution, it works for the example data I provided, so thanks anyway, but I get some duplication in certain other cases. – Miha Trošt Jun 01 '18 at 12:27
  • Do you mind writting some comments to the code, so it would be easier to understand? – Miha Trošt Jun 01 '18 at 12:43
  • Updated the answer with the comments, I hope it helps. – Marc P Jun 01 '18 at 13:18
  • 2
    Nice work! Just a comment: while for loop approach definitely works, it didn't scale well to large dataset due to the overhead invoked for each row. It is better to use vectorized operations. I up-vote for the hard work but I do not encourage this way. – mt1022 Jun 01 '18 at 13:27
  • I completely agree, @Ryan showed that his approach was twice as fast as mine. Scalability will definitely be an issue with nested loops. – Marc P Jun 01 '18 at 15:23
2
library(data.table)
setDT(test)

test[, `:=`(npeak = rev(cumsum(rev(peak)))
          ,  pos  = rleid(value >= 0))]
test[, preceding := pos == pos[peak]
     , by = npeak]
test[value > 0 & preceding, .(year, value, peak)]

or more concisely

library(magrittr)

test[, preceding := rleid(value >= 0) %>% `==`(.[peak])
     , by = peak %>% rev %>% cumsum %>% rev
     ][value > 0 & preceding, .(year, value, peak)]

#    year     value  peak
# 1: 1998 0.1138524 FALSE
# 2: 1999 0.9939848 FALSE
# 3: 2000 1.7027944  TRUE
# 4: 2006 0.4874829 FALSE
# 5: 2007 2.2080809  TRUE

A solution rewritten in dplyr syntax + data.table::rleid():

library(dplyr)

test %>% 
  mutate(npeak = rev(cumsum(rev(peak))),
         pos = rleid(value >= 0)) %>% 
  filter(npeak != 0) %>% 
  group_by(npeak) %>% 
  mutate(preceding = value > 0 & pos == pos[peak]) %>%
  ungroup() %>% 
  filter(preceding == TRUE)

# A tibble: 5 x 6
   year value peak  npeak   pos preceding
  <int> <dbl> <lgl> <int> <int> <lgl>    
1  1998 0.114 FALSE     2     2 TRUE     
2  1999 0.994 FALSE     2     2 TRUE     
3  2000 1.70  TRUE      2     2 TRUE     
4  2006 0.487 FALSE     1     4 TRUE     
5  2007 2.21  TRUE      1     4 TRUE 
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38