I have a long running process in R (via Rstudio-server), which I suspect may have a memory problem, eventuating in the R session crashing. Unfortunately I am not around to monitor exactly what is going on: do crash logs exist and where do I find them if they do?
My setup is as follows:
- Rstudio-server installed in ubuntu 12.04, on a vmmare player virtual machine.
- I access the r session from firefox, on a windows 7 installation.
I leave the program running overnight, and come back to find the following error messagein the rstudio interface:
The previous R session was abnormally terminated due to an unexpected crash. You may have lost workspace data as a result of this crash.
It appears that the following code is causing the problem (not a reproducible sample). The code takes a list of regression formulas (about 250k), a data frame of 1500 rows by 70 columns, and also allows you to specify the number of cores to be used in the calculation:
get_models_rsquared = function(combination_formula,df,cores = 1){
if (cores == "ALL"){cores <- detectCores()}
require(parallel) #if parallel processing is not required, mclapply should be changed to lapply.
#using mclapply to calculate linear models in parallel,
#storing adjusted r-squared and number of missing rows in a list
combination_fitted_models_rsq = mclapply(seq_along(combination_formula), function(i)
list(summary(lm(data = df, combination_formula[[i]]))$adj.r.squared,
length(summary(lm(data = df, combination_formula[[i]]))$na.action)), mc.cores = cores )
#now store the adjusted r-squared and missing rows of data
temp_1 = lapply(seq_along(combination_fitted_models_rsq), function(i)
combination_fitted_models_rsq[[i]][[1]])
temp_1 = as.numeric(temp_1)
temp_2 = lapply(seq_along(combination_fitted_models_rsq), function(i)
combination_fitted_models_rsq[[i]][[2]])
temp_2 = as.numeric(temp_2)
#this is the problematic line
temp_3 = lapply(seq_along(combination_formula), function(i) {
length(attributes(terms.formula(combination_formula[[i]]))$term.labels)
}#tells you number of predictors in each formula used for linear regression
)#end lapply
result = data.frame(temp_1,temp_2,temp_3)
names(result) = c("rsquared","length.na","number_of_terms")
return(result)
}
The calculation of temp_3
seems to give the problems when the function is called. However, it all works properly if you take the code for temp_3
out of the function and calculate it after running the function.