1

I have an r script that I am attempting to run in Rstudio on Ubuntu 18.04 which is being dual booted with windows 10 with the following specs:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4  

To start off, this script was initially written in windows and ran fine (the reason I am trying it in ubuntu is because I needed a specific package that ran better in linux). So this seems to imply that linux is allocating less memory to R than windows, which is very unintuitive to me. Is there a way to check how much memory is allocated to R and possibly increase it, similar to memory.limit() for Windows? Or does anyone have another idea that could explain why the exact same script that runs in my windows partition can't run on my linux one? The dataset is pretty big, so I'm not sure how I can share a meaningful example, and regardless I'm more interested in why the difference exists between the two at this point. If it is helpful at all, below is the code I have attempted to run, and it errors out when I try to change the variable types.

path<-getwd()
file.names <- dir(path,pattern =".txt")

#Use fread (a part of the data.table package) within lapply to import files into a list
datalist<-lapply(file.names,
                 function(x)fread(x,
                                  header=FALSE,
                                  sep=",",
                                  skip=1,
                                  stringsAsFactors=TRUE,
                                  col.names = c("User_ID","Rating","Rating_Date")))

#Use rbindlist to turn the list into a dataframe with the name of the list being the list of the file names
df<-rbindlist(datalist,idcol=file.names)

rm(datalist,file.names,path)

colnames(df)<-c("Movie_ID","User_ID","Rating","Rating_Date")

df$User_ID<-as.factor(as.character(df$User_ID))
df$Movie_ID<-as.factor(as.character(df$Movie_ID))

Edit to address whether it's a duplicate: I think it is related to that question, but is definitely not a duplicate. The main difference is I am both trying to understand how to check available memory in R on Linux (which that post may address), and understand why Linux would run out of memory when windows doesn't on the same script.

user2355903
  • 593
  • 2
  • 8
  • 29
  • 1
    The fact that the code ran fine in windows but not in ubuntu – user2355903 Jul 06 '18 at 23:02
  • Edited question to address whether it was a duplicate – user2355903 Jul 06 '18 at 23:39
  • The part that you say is different "understand why Linux would run out of memory when windows doesn't on the same script" is explaining a phenomenom we don't even really know exists. I'd suggest to first make sure the same amount of memory is available, per the duplicated part. If you find that both have identical memory allocation then show the commands and output to that effect and show something indicated the GB's used in both OS's – Hack-R Jul 07 '18 at 00:28

0 Answers0