Loop function on timeseries works on small df, but not in large df - Error: C stack usage...too close to the limit

Question

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying for the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - In a dataframe, find the index of the next smaller value for each element of a column). This works perfectly on a smaller dataframe, but not a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit". Having seen other similar questions (e.g., Error: C stack usage is too close to the limit), I do not think its a problem of an infinite recursive function, but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
library(dplyr)
df <- data.frame("Date_time" =seq(from=as.POSIXct("2022-01-01 00:00") , by= 15*60, to=as.POSIXct("2022-01-01 07:00")), 
             "Site" = rep(c("Site A", "Site B"), each = 29),
             "Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
                         16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
                         rep(10.3,times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >=2),"Surge_start"] <- 
 paste("Surge",seq(1,length(which(df$Lead_Value - df$Value >=2)),1),sep="")

###Applying the 'find.next.smaller' function

find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
     find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine

df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work

`This works fine` ... when I run your code, I get `no non-missing arguments to min; returning Inf`, is that "fine" and a known thing? — r2evans, May 25 '23 at 12:09
Hi, thanks for this! Yes the warning messages are fine. Clearly a memory problem in R which I'm not sure how to overcome. Whether that's somehow adapting the C stake size (which I don't know how to do) or adapting this 'find.next.smaller' code. — James White, May 25 '23 at 12:22
I think the issue is exactly what it says: `evaluation nested too deeply` is a comment on how deeply one recurses in the function. R doesn't do tail-recursion "efficiently", it keeps everything on the stack. R is getting concerned (I don't know the limit, offhand), and perhaps rightfully so. — r2evans, May 25 '23 at 12:24
Sounds like I need some kind of alternative to the 'find.next.smaller' function (highlighted here -https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c) then. Thanks — James White, May 25 '23 at 12:26
Likely. Don't do recursion like that with larger data, it's (obviously) not a good practice. To be precise, you're looking for the date (***after*** "today") where the value is closest and less than "today's" value? (Only considering "Surge" days, that is.) — r2evans, May 25 '23 at 12:28

r2evans · Accepted Answer · 2023-05-25T12:40:59.100

I suggest you don't need recursion.

find_nearest_value <- function(surge, time1, val1, times, vals) {
  if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]
}

Output %>%
  group_by(Site) %>%
  mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
  print(n=99)
# # A tibble: 58 × 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    <chr>               <chr>  <dbl>      <dbl> <chr>       <chr>               <chr>              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 38 2022-01-01 02:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 39 2022-01-01 02:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 40 2022-01-01 02:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 41 2022-01-01 02:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 42 2022-01-01 03:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 43 2022-01-01 03:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 44 2022-01-01 03:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 45 2022-01-01 03:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 46 2022-01-01 04:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 47 2022-01-01 04:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 48 2022-01-01 04:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 49 2022-01-01 04:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 50 2022-01-01 05:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 51 2022-01-01 05:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 52 2022-01-01 05:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 53 2022-01-01 05:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 54 2022-01-01 06:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 55 2022-01-01 06:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 56 2022-01-01 06:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 57 2022-01-01 06:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 58 2022-01-01 07:00:00 Site B  10.3       NA   NA          NA                  NA

@JamesWhite, I edited the function a little, make sure you see the check for `surge` — r2evans, May 25 '23 at 12:41
Hi, apologies but this latest function isn't quite working for me. I get this error message "Error in `mutate()`: ! Problem while computing `end2 = if_else(...)`. ℹ The error occurred in group 1: Site = "Site A". Caused by error in `if_else()`: ! `false` must be a character vector, not a logical vector. Run `rlang::last_error()` to see where the error occurred." Before I got around it by simply quoting "NA" in the ifelse function, but doesn't work now — James White, May 25 '23 at 12:44
Okay, change the `if_else(.., ..., NA)` to `if_else(..., ..., NA_character_)` or use `ifelse`. Know that this warning is a safeguard, and `base::ifelse` is not class-safe. — r2evans, May 25 '23 at 12:56

Ricardo Semião e Castro · Answer 2 · 2023-05-25T13:06:57.200

Possibly the recursion uses too much memory, and you're probably better of with a vectorized/looped approach, even if it takes a bit longer. Below I made an alteration to your function and created some options.

Some options

Original:

find.next.smaller_rec <- function(ini = 1, vec) {
  if(length(vec) == 1) NA 
  else c(ini + min(which(vec[1] >= vec[-1])), 
         find.next.smaller_rec(ini + 1, vec[-1]))
}

The building block for the vectorized ones:

find.next.smaller <- function(val, vec) {
  if(val == length(vec)) NA  else val + min(which(vec[val] >= vec[-(1:val)]))
}

With a for loop:

find.next.smaller_for <- function(x, vec){
  result <- numeric(x)
  for(val in 1:x){
    result[val] <- find.next.smaller(val, vec)
  }
  result
}

With Vectorize():

find.next.smaller_vec <- Vectorize(find.next.smaller, "val")

With purrr::map:

find.next.smaller_map <- function(x, vec){
  map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

Comparison:

bench <- bench::mark(find.next.smaller_rec(1, df$Value),
                     find.next.smaller_for(nrow(df), df$Value),
                     find.next.smaller_vec(1:nrow(df), df$Value),
                     find.next.smaller_map(nrow(df), df$Value),
                     min_time = 2)

bench %>% select(c(median, mem_alloc, n_gc, `gc/sec`))

    median mem_alloc  n_gc `gc/sec`
  <bch:tm> <bch:byt> <dbl>    <dbl>
1    496µs    92.4KB    13     7.30
2    582µs    77.1KB    10     5.46
3    612µs    78.7KB    10     5.97
4    681µs    77.1KB    10     5.40

We can see that, even if it's faster, the recursion uses more memory, and this might be the reason for your error.

There probably are even better options, I just wanted to present ones that were similar to your original one.

Applying them to the problem

Output <- df %>%
  group_by(Site) %>%
  mutate(Surge_end = ifelse(grepl("Surge",Surge_start),
                            Date_time[find.next.smaller_for(n(), Value)],
                            NA_character_))

Where you can also use Date_time[find.next.smaller_map(n(), Value)] or Date_time[find.next.smaller_vec(1:n(), Value)].

This is brilliant thanks, although the next step would be identifying the date/time where these next lower values occur? — James White, May 25 '23 at 12:55
Your Welcome! I don't understand. These functions don't accomplish what you want? — Ricardo Semião e Castro, May 25 '23 at 12:57
It correctly identifies the next value that falls below (or matches) the start of a 'surge', but I would like to know the date of this (i.e., the surge end). Apologies this is my fault as I've edited the question since to make it clearer (I thought initially this was going to be a C stack / memory issue) — James White, May 25 '23 at 13:01
Oh, it was the same way you were already doing, placing those functions inside that `ifelse` (with some minor modifications). Check my edit! — Ricardo Semião e Castro, May 25 '23 at 13:07

Loop function on timeseries works on small df, but not in large df - Error: C stack usage...too close to the limit

2 Answers2

Some options

Comparison:

Applying them to the problem