0

I'm trying to open a rather large file in R to do analyses on its data. Currently, what I have is:

x = data.table::fread("file_name_here")

When I run this, line, I get

"Error: cannot allocate vector of size 2.6 Mb".

When I run the line without assigning a variable:

data.table::fread("file_name_here")

I instead get:

"Error in writeBin(bfr, con = out, size = 1L) :
'R_Calloc' could not allocate memory (10000000 of 1 bytes)".

I have tried doing:

Sys.setenv("VROOM_CONNECTION_SIZE" = 500000000)

But it didn't fix anything. I also cannot do memory.limit() because it says it's no longer supported. Are there any other ways I can open this large file in R? Note that I do need to analyze all of its contents so I cannot trim down the file. Also, I think I need to stick with the fread format because doing it in other ways caused the formatting of the file's contents to get messed up (and testing the code on a smaller but similar file, I do know freads keeps the file formatting consistent and correct).

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Feynman
  • 13
  • 3
  • You don't have sufficient memory to import this data. Get sufficient memory (buy additional RAM) or import and process the data in chunks. – Roland May 25 '23 at 10:54
  • 2
    You could try the `ff` package which allows to read larger-than-memory files: https://cran.r-project.org/web/packages/ff/index.html Or try the `arrow`-package (https://arrow.apache.org/docs/r/) which apparently also allows this. – shghm May 25 '23 at 11:00
  • I second the suggestion for `arrow::read_csv_arrow`. When used with `dplyr`, it does "lazy" reading of the data, allowing you to filter, do some mutate/summary operations, even some grouping, before the eventual `collect` operation which is when the data is _actually_ read from the file. The premise here is that your `filter` and other operations will reduce the size of your data sufficiently so that it will fit in your available memory. – r2evans May 25 '23 at 11:12
  • Another option, perhaps, though not as likely to work completely: reduce the columns as well using `fread(.., select=...)`. This will really only work if there are a lot of columns and you need only a few of them. (Even then it may not be enough.) – r2evans May 25 '23 at 11:14
  • @r2evans I did what you suggested and I'm getting the message "Error: cannot allocate vector of size 64 Kb" – Feynman May 25 '23 at 19:20
  • 1
    Which did you try? If you tried `read_csv_arrow`, then you are not filtering your data down enough to fit inside your memory. _No package_ is going to let you know more data than you have memory. You can only hope to either (a) do all reduction outside of R in more memory-frugal mechanisms (including `arrow`/parquet and I suspect `ff`, (b) work on a **small** subset of rows at a time, or (c) buy more memory. Find a way to split your data such that you can get the aggregations you need. – r2evans May 25 '23 at 19:34
  • Okay. Currently I'm using a rental laptop with 8GB because I had to send my Asus laptop to the shop to get it repaired. Should a 16GB laptop be sufficient? – Feynman May 25 '23 at 20:08
  • 2
    We can't tell how much memory you need because we don't know anything about your data set or what you're trying to do with it. The first two options provided by @r2evans are your only choice. You said "doing it [i.e., reading the file] in other ways caused the formatting of the file's contents to get messed up", but we don't know what that means so we can't help with it. – Ben Bolker May 25 '23 at 21:07
  • Have you tried setting `R_MAX_VSIZE` in your `.Renviron` file, as in [this answer](https://stackoverflow.com/q/51295402/570918)? – merv May 25 '23 at 23:17
  • @merv Forget that. If you don't have enough memory on the machine, you don't have enough memory on the machine. And nowadays 8 GB is often not enough for a machine intended for data analyses. You can't change that fact by any R settings. If OP is running a *nix system and has a fast SSD, maybe swap memory could be used. But OP doesn't even tell us how much disk space the file occupies. – Roland May 26 '23 at 06:01

0 Answers0