0
output_list = lapply(seq_along(input_df), generate_data)

I'm looping through many data frames, and applying my own function generate_data which returns a list.

Each list is nested within the output_list, but if the loop times out, how can I keep the lists which have been generated so far within output_list?

I've looked into the functions tryCatch(), withTimeout(), and memory.limit().`


The solution to my problem was to use assign() to assign each list to a new environment:

output_list = lapply(seq_along(input_df), 
              function (x) {
              temp = generate_data(x)
              assign(paste0("df_", x), temp, envir = e)
              return(temp)
              })

Nope. The solution to my problem was to use save() inside the lapply loop:

output_list = lapply(seq_along(input_df), 
              function (x) {
              temp = generate_data(x)
              save(temp, file = paste0("DF/df_", x))
              return(temp)
              })

Saving to disk means I don't lose all my data if the session crashes.

  • 4
    `lapply` doesn't immediately have the mechanism, so you'll need to add that logic inside of it (in `generate_data`). It might involve adding `function(i) tryCatch(generate_data(i), error=function(e) e)`, with two notable consequences: if one input fails/times-out, it should continue after it; if one errors, using `function(e) e` will put the error as the return-value, so you could also do `function(e) NULL`, over to you. – r2evans Jun 02 '18 at 07:17
  • 1
    You might also have luck with `withCallingHandlers`, though I'm not ready to explain the "how" of this ... see https://stackoverflow.com/a/20578779/3358272 and https://stackoverflow.com/a/32172793/3358272 for some great examples from @MartinMorgan. – r2evans Jun 02 '18 at 07:24
  • 2
    What does "times out" mean exactly? – Roland Jun 02 '18 at 07:58
  • The code is running on a cluster which returns a 502 error at some point during the loop. After the time out, output_list does not exist in any form. – user5720052 Jun 02 '18 at 08:01
  • 1
    you can also break the data frame into smaller data frames – Ayush Nigam Jun 02 '18 at 10:00
  • I think list2env might be the oslution to my problem. I've edited the question to include this – user5720052 Jun 02 '18 at 11:03
  • 1
    One should rarely if ever use `list2env` and `assign`. Curious, why did `tryCatch` not work? – Parfait Jun 02 '18 at 15:40
  • @Parfait tryCatch won't solve the problem because the whole lapply loop is timing out, not elements of the list. I'm trying to use list2env so that if that happens, all the elements generated so far still exist as objects. – user5720052 Jun 02 '18 at 23:22
  • @Parfait, I see your point now. I only needed assign, not list2env. assign(paste0("df_", x), temp, envir = e) – user5720052 Jun 03 '18 at 00:02
  • If this indeed solves your problem, you should use a `for` loop instead of `lapply`. – Roland Jun 04 '18 at 06:12
  • @Roland I've updated the original post with my current solution. I'm saving to disk. – user5720052 Jun 04 '18 at 10:43

0 Answers0