I have an r script that I am attempting to run in Rstudio on Ubuntu 18.04 which is being dual booted with windows 10 with the following specs:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4
To start off, this script was initially written in windows and ran fine (the reason I am trying it in ubuntu is because I needed a specific package that ran better in linux). So this seems to imply that linux is allocating less memory to R than windows, which is very unintuitive to me. Is there a way to check how much memory is allocated to R and possibly increase it, similar to memory.limit() for Windows? Or does anyone have another idea that could explain why the exact same script that runs in my windows partition can't run on my linux one? The dataset is pretty big, so I'm not sure how I can share a meaningful example, and regardless I'm more interested in why the difference exists between the two at this point. If it is helpful at all, below is the code I have attempted to run, and it errors out when I try to change the variable types.
path<-getwd()
file.names <- dir(path,pattern =".txt")
#Use fread (a part of the data.table package) within lapply to import files into a list
datalist<-lapply(file.names,
function(x)fread(x,
header=FALSE,
sep=",",
skip=1,
stringsAsFactors=TRUE,
col.names = c("User_ID","Rating","Rating_Date")))
#Use rbindlist to turn the list into a dataframe with the name of the list being the list of the file names
df<-rbindlist(datalist,idcol=file.names)
rm(datalist,file.names,path)
colnames(df)<-c("Movie_ID","User_ID","Rating","Rating_Date")
df$User_ID<-as.factor(as.character(df$User_ID))
df$Movie_ID<-as.factor(as.character(df$Movie_ID))
Edit to address whether it's a duplicate: I think it is related to that question, but is definitely not a duplicate. The main difference is I am both trying to understand how to check available memory in R on Linux (which that post may address), and understand why Linux would run out of memory when windows doesn't on the same script.