0

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying for the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - In a dataframe, find the index of the next smaller value for each element of a column). This works perfectly on a smaller dataframe, but not a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit". Having seen other similar questions (e.g., Error: C stack usage is too close to the limit), I do not think its a problem of an infinite recursive function, but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
library(dplyr)
df <- data.frame("Date_time" =seq(from=as.POSIXct("2022-01-01 00:00") , by= 15*60, to=as.POSIXct("2022-01-01 07:00")), 
             "Site" = rep(c("Site A", "Site B"), each = 29),
             "Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
                         16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
                         rep(10.3,times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >=2),"Surge_start"] <- 
 paste("Surge",seq(1,length(which(df$Lead_Value - df$Value >=2)),1),sep="")

###Applying the 'find.next.smaller' function

find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
     find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine

df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work
James White
  • 705
  • 2
  • 7
  • 20
  • `This works fine` ... when I run your code, I get `no non-missing arguments to min; returning Inf`, is that "fine" and a known thing? – r2evans May 25 '23 at 12:09
  • Hi, thanks for this! Yes the warning messages are fine. Clearly a memory problem in R which I'm not sure how to overcome. Whether that's somehow adapting the C stake size (which I don't know how to do) or adapting this 'find.next.smaller' code. – James White May 25 '23 at 12:22
  • I think the issue is exactly what it says: `evaluation nested too deeply` is a comment on how deeply one recurses in the function. R doesn't do tail-recursion "efficiently", it keeps everything on the stack. R is getting concerned (I don't know the limit, offhand), and perhaps rightfully so. – r2evans May 25 '23 at 12:24
  • Sounds like I need some kind of alternative to the 'find.next.smaller' function (highlighted here -https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c) then. Thanks – James White May 25 '23 at 12:26
  • Likely. Don't do recursion like that with larger data, it's (obviously) not a good practice. To be precise, you're looking for the date (***after*** "today") where the value is closest and less than "today's" value? (Only considering "Surge" days, that is.) – r2evans May 25 '23 at 12:28

2 Answers2

1

I suggest you don't need recursion.

find_nearest_value <- function(surge, time1, val1, times, vals) {
  if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]
}

Output %>%
  group_by(Site) %>%
  mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
  print(n=99)
# # A tibble: 58 × 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    <chr>               <chr>  <dbl>      <dbl> <chr>       <chr>               <chr>              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 38 2022-01-01 02:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 39 2022-01-01 02:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 40 2022-01-01 02:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 41 2022-01-01 02:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 42 2022-01-01 03:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 43 2022-01-01 03:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 44 2022-01-01 03:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 45 2022-01-01 03:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 46 2022-01-01 04:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 47 2022-01-01 04:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 48 2022-01-01 04:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 49 2022-01-01 04:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 50 2022-01-01 05:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 51 2022-01-01 05:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 52 2022-01-01 05:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 53 2022-01-01 05:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 54 2022-01-01 06:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 55 2022-01-01 06:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 56 2022-01-01 06:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 57 2022-01-01 06:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 58 2022-01-01 07:00:00 Site B  10.3       NA   NA          NA                  NA                 
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • @JamesWhite, I edited the function a little, make sure you see the check for `surge` – r2evans May 25 '23 at 12:41
  • Hi, apologies but this latest function isn't quite working for me. I get this error message "Error in `mutate()`: ! Problem while computing `end2 = if_else(...)`. ℹ The error occurred in group 1: Site = "Site A". Caused by error in `if_else()`: ! `false` must be a character vector, not a logical vector. Run `rlang::last_error()` to see where the error occurred." Before I got around it by simply quoting "NA" in the ifelse function, but doesn't work now – James White May 25 '23 at 12:44
  • Okay, change the `if_else(.., ..., NA)` to `if_else(..., ..., NA_character_)` or use `ifelse`. Know that this warning is a safeguard, and `base::ifelse` is not class-safe. – r2evans May 25 '23 at 12:56
1

Possibly the recursion uses too much memory, and you're probably better of with a vectorized/looped approach, even if it takes a bit longer. Below I made an alteration to your function and created some options.

Some options

Original:

find.next.smaller_rec <- function(ini = 1, vec) {
  if(length(vec) == 1) NA 
  else c(ini + min(which(vec[1] >= vec[-1])), 
         find.next.smaller_rec(ini + 1, vec[-1]))
}

The building block for the vectorized ones:

find.next.smaller <- function(val, vec) {
  if(val == length(vec)) NA  else val + min(which(vec[val] >= vec[-(1:val)]))
}

With a for loop:

find.next.smaller_for <- function(x, vec){
  result <- numeric(x)
  for(val in 1:x){
    result[val] <- find.next.smaller(val, vec)
  }
  result
}

With Vectorize():

find.next.smaller_vec <- Vectorize(find.next.smaller, "val")

With purrr::map:

find.next.smaller_map <- function(x, vec){
  map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

Comparison:

bench <- bench::mark(find.next.smaller_rec(1, df$Value),
                     find.next.smaller_for(nrow(df), df$Value),
                     find.next.smaller_vec(1:nrow(df), df$Value),
                     find.next.smaller_map(nrow(df), df$Value),
                     min_time = 2)

bench %>% select(c(median, mem_alloc, n_gc, `gc/sec`))

    median mem_alloc  n_gc `gc/sec`
  <bch:tm> <bch:byt> <dbl>    <dbl>
1    496µs    92.4KB    13     7.30
2    582µs    77.1KB    10     5.46
3    612µs    78.7KB    10     5.97
4    681µs    77.1KB    10     5.40

We can see that, even if it's faster, the recursion uses more memory, and this might be the reason for your error.

There probably are even better options, I just wanted to present ones that were similar to your original one.

Applying them to the problem

Output <- df %>%
  group_by(Site) %>%
  mutate(Surge_end = ifelse(grepl("Surge",Surge_start),
                            Date_time[find.next.smaller_for(n(), Value)],
                            NA_character_))

Where you can also use Date_time[find.next.smaller_map(n(), Value)] or Date_time[find.next.smaller_vec(1:n(), Value)].

  • This is brilliant thanks, although the next step would be identifying the date/time where these next lower values occur? – James White May 25 '23 at 12:55
  • Your Welcome! I don't understand. These functions don't accomplish what you want? – Ricardo Semião e Castro May 25 '23 at 12:57
  • It correctly identifies the next value that falls below (or matches) the start of a 'surge', but I would like to know the date of this (i.e., the surge end). Apologies this is my fault as I've edited the question since to make it clearer (I thought initially this was going to be a C stack / memory issue) – James White May 25 '23 at 13:01
  • Oh, it was the same way you were already doing, placing those functions inside that `ifelse` (with some minor modifications). Check my edit! – Ricardo Semião e Castro May 25 '23 at 13:07