1

I'm encontering a strange error: I've created a CSV file (~450Mb) on an Ubuntu 16.04 machine with R version 3.2.3 and 4Gb of memory. Everytime I try to read.csv() this file on a machine with 8Gb and more recent versions of Ubuntu/R (Ubuntu 16.10 with R version 3.3.1 (64bits) or Ubuntu 17.04 with R version 3.3.2 (64bits)), it fails with: Error: memory exhausted (limit reached?) or Error: cannot allocate vector of size 1.3 Mb (when I drastically increase ulimit -s before running R).

The file is here: https://mega.nz/#!ZMs0TSRJ!47DCZCnE6_FnICUp8MVS2R9eY_GdVIyGZ5O9TiejHfc

$ ls -ahl trainWhole.csv
-rw------- 1 gyu gyu 462M Mar 14 10:11 trainWhole.csv

$ R --version
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
...

$ R --no-save --no-restore-data --quiet

11:00:38 R > sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu Zesty Zapus (development branch)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

11:19:01 R > df <- read.csv("./trainWhole.csv")
Error: memory exhausted (limit reached?)

I've tried modifying my system's limits, with no luck:

$ ulimit -s 819200
$ R --no-save --no-restore-data --quiet
11:00:59 R > df <- read.csv("./trainWhole.csv")
Error: cannot allocate vector of size 1.3 Mb

I found a few similar questions on SO, but no working solution... Error: cannot allocate vector of size X Mb in R R memory management / cannot allocate vector of size n Mb Error: memory exhausted (limit reached?) (R error) Error: cons memory exhausted (limit reached?)

I first posted the question on R-devel, but they redirected me here...

I don't think there's actually a memory limit problem since the file is ~500MB and I have 8GB of RAM, with almost nothing running apart FF and R:

$ free
              total        used        free      shared  buff/cache   available
Mem:        8043928     1385776     4808796       95288     1849356     6303680
Swap:       4194300           0     4194300

Since everything works smoothlessly on my Ubuntu 16.04/4GB machine, I'll try to save my data as an RData file, and I'm almost sure it will work, but I'd love to have an explanation/solution for the loading of CSV files on more recent versions of R/Ubuntu...

PS: I've just tried, the CSV file loads perfectly on a first-generation-7" eeePC with 2GB of RAM, running Kali Linux...

Community
  • 1
  • 1
user2115112
  • 29
  • 1
  • 4
  • You never told us how large `trainWhole.csv` is. How big is it? How many rows and columns does it have? – Tim Biegeleisen Mar 27 '17 at 10:14
  • Have you tried fread from the data.table package? – Kristoffer Winther Balling Mar 27 '17 at 10:57
  • @KristofferWintherBalling Good idea, `fread` will probably solve the problem but the question is why does `read.csv` fail on some systems or different R versions. I could imagine there is a problem with parsing many or a special double value (if have compared `read.csv` and `fread` and about 500 cells have different internal values due to small precision variances that could be explained by different parsing algorithms. – R Yoda Mar 27 '17 at 16:26
  • @user2115112 I could not reproduce the problem on my computer (Ubuntu 14.04 with R3.3.3 in RStudio. Would it be possible to install the exact working version R on another computer where `read.csv` fails? Could there possibly a problem when you copy the file? Could you please try to read only the first n rows using the `nrows` parameter and try to find out the limit by "bisecting" nrows? – R Yoda Mar 27 '17 at 16:31
  • @user2115112 Short update: I have checked https://bugs.r-project.org/bugzilla3/ but could not find any open or already closed bug related to what you describe. – R Yoda Mar 27 '17 at 23:51

1 Answers1

1

try to delete several rows of the libraries in your setup. Because some of the libraries are actually masked from a bigger library. Loading them causing a lot of issues. Not the actual dataset. (At least that's the case for me) After I removed several lines of library setup, I was able to knit the R markdown. Hope that helps.

minq
  • 11
  • 1
  • I did not expect this, but indeed I had the habit to include a whole bunch of libraries when logging into R. Removing shiny, leaflet and + made this memory error disappear. – Xavier Prudent Jul 29 '21 at 18:44