I am currently working on a task that involves applying a function to a fairly extensive list of tibbles, comprising approximately 30,000 elements. The code I'm using is as follows:
plan(multisession, workers=20)
hpar$input_df %>%
group_by(key) %>%
group_split() %>%
future_walk(sf_do_all_one_series,
hpar$df_stockout_decaying,
hpar$out_path,
hpar$out_prefix,
.env_globals = empty_env())
The function sf_do_all_one_series
, which is called within the future_walk function, has the following structure:
sf_do_all_one_series <- function(df_1, df_stockout_decaying, out_path, out_prefix){
key <- df_1 %>% distinct(key) %>% pull()
do_stuff(df_1, df_stockout_decaying) %>%
do_more_stuff() %>%
write.table(file=paste0(out_path, out_prefix, "_", key, ".csv"), quote=FALSE, sep='\t', row.names=FALSE)
invisible()
}
The hpar$input_df
tibble consists of approximately 3 million records, and its size, calculated using object.size(.)
, is around 202 MB. On the other hand, hpar$df_stockout_decaying
is a small tibble containing constant values. Lastly, hpar$out_path
and hpar$out_prefix
are character strings.
The issue I'm encountering is that the memory usage during this process shows a DRAMATIC increase over time, as if some intermediate output is being saved. I'm seeking guidance on understanding the potential cause of this memory increase and any possible solutions.
I have made several attempts to address the memory issue but haven't been successful so far. Here are the steps I've taken:
Removing intermediate objects: I tried minimizing the use of intermediate objects within the code to reduce memory consumption.
Garbage collector: I also explicitly called the garbage collector using the
gc()
function to free up any unused memory..env_globals = empty_env()
: Although I don't believe the.env_globals = empty_env()
parameter is necessary for future_walk, I still tried including it in the hope that it might have an impact.
Unfortunately, despite implementing these measures, memory usage has not shown any noticeable improvement.
I would greatly appreciate any suggestions or insights to help resolve this issue.