1

I have a long running process in R (via Rstudio-server), which I suspect may have a memory problem, eventuating in the R session crashing. Unfortunately I am not around to monitor exactly what is going on: do crash logs exist and where do I find them if they do?

My setup is as follows:

  • Rstudio-server installed in ubuntu 12.04, on a vmmare player virtual machine.
  • I access the r session from firefox, on a windows 7 installation.
  • I leave the program running overnight, and come back to find the following error messagein the rstudio interface:

    The previous R session was abnormally terminated due to an unexpected crash. You may have lost workspace data as a result of this crash.

It appears that the following code is causing the problem (not a reproducible sample). The code takes a list of regression formulas (about 250k), a data frame of 1500 rows by 70 columns, and also allows you to specify the number of cores to be used in the calculation:

get_models_rsquared = function(combination_formula,df,cores = 1){
  if (cores == "ALL"){cores <- detectCores()}
  require(parallel) #if parallel processing is not required, mclapply should be changed to lapply.

  #using mclapply to calculate linear models in parallel,
  #storing adjusted r-squared and number of missing rows in a list
  combination_fitted_models_rsq = mclapply(seq_along(combination_formula), function(i) 
    list(summary(lm(data = df, combination_formula[[i]]))$adj.r.squared,
         length(summary(lm(data = df, combination_formula[[i]]))$na.action)), mc.cores = cores  )

  #now store the adjusted r-squared and missing rows of data
  temp_1 = lapply(seq_along(combination_fitted_models_rsq), function(i) 
    combination_fitted_models_rsq[[i]][[1]])
  temp_1 = as.numeric(temp_1)

  temp_2 = lapply(seq_along(combination_fitted_models_rsq), function(i) 
    combination_fitted_models_rsq[[i]][[2]])
  temp_2 = as.numeric(temp_2)

  #this is the problematic line
  temp_3 =  lapply(seq_along(combination_formula), function(i) {
    length(attributes(terms.formula(combination_formula[[i]]))$term.labels)
  }#tells you number of predictors in each formula used for linear regression
   )#end lapply
  result = data.frame(temp_1,temp_2,temp_3)
  names(result) = c("rsquared","length.na","number_of_terms")
  return(result)
}

The calculation of temp_3 seems to give the problems when the function is called. However, it all works properly if you take the code for temp_3 out of the function and calculate it after running the function.

Alex
  • 15,186
  • 15
  • 73
  • 127
  • Ok, I've just managed to duplicate the problem live. My ubuntu system monitor tells me that the process starts using a lot of memory after about 30 minutes. I can't quite work out which bit of the code is causing the problem though, I will see whether I can find any memory usage diagnostic help here. – Alex Apr 11 '13 at 02:02
  • What do you have in memory when you startup the session? – Ricardo Saporta Apr 11 '13 at 02:46
  • Are you running some script in R session or does it just crash without any script? – CHP Apr 11 '13 at 02:56
  • @RicardoSaporta what do you mean? My ubuntu virtual machine has about 12 gb memory free for R. @geektrader I am running a script, which calls a function which uses the parallelised `mclapply`. – Alex Apr 11 '13 at 03:11
  • 1
    There is probably a file called `core` in the working directory or in your user directory. You could probably read that with [gdb](http://www.gnu.org/software/gdb/gdb.html) on linux (see also [this post](http://www.cyberciti.biz/tips/linux-core-dumps.html)). – GSee Apr 11 '13 at 03:21
  • @GSee yes, I see a core file. Thanks, I'll have a look :) – Alex Apr 11 '13 at 03:24
  • I've solved my particular problem! `temp_1` and `temp_2` are numeric vectors. `temp_3` is a list. `data.frame(temp_1,temp_2,temp_3)` doesn't work for this reason! **This** works fine: `data.frame(temp_1,temp_2,unlist(temp_3))` – Alex Apr 11 '13 at 05:08
  • and here is the reference to using `unlist` before coercing into a data frame: http://stackoverflow.com/questions/4227223/r-list-to-data-frame – Alex Apr 11 '13 at 05:14

0 Answers0