Having trouble loading a large text file; I'll post the code below. The file is ~65 GB and is separated using a "|". I have 10 of them. The process I'll describe below has worked for 9 files but the last file is giving me trouble. Note that about half of the other 9 files are larger than this - about 70 GB.
# Libraries I'm using
library(readr)
library(dplyr)
# Function to filter only the results I'm interested in
f <- function(x, pos) filter(x, x[,41] == "CA")
# Reading in the file.
# Note that this has worked for 9/10 files.
tax_history_01 <- read_delim_chunked( "Tax_History_148_1708_07.txt",
col_types = cols(`UNFORMATTED APN` = col_character()),
DataFrameCallback$new(f), chunk_size = 1000000, delim = "|")
This is the error message I get:
Error: cannot allocate vector of size 81.3 Mb
Error during wrapup: could not allocate memory (47 Mb) in C function 'R_AllocStringBuffer'
If it helps, Windows says the file is 69,413,856,071 bytes and readr is indicating 100% at 66198 MB. I've done some searching and really haven't a clue as to what's going on. I have a small hunch that there could be something wrong with the file (e.g. a missing delimiter).
Edit: Just a small sample of the resources I consulted. More specifically what's giving me trouble is "Error during wrapup: ... in C function 'R_AllocStringBuffer' " - I can't find much on this error.
Some of the language in this post has led me to believe that the limit of a string vector has been reached and there possibly a parsing error. R could not allocate memory on ff procedure. How come?
Saw this post and it seemed I was facing a different issue. For me it's not really a calculations issue. R memory management / cannot allocate vector of size n Mb
I referred to this post regarding cleaning up my work space. Not really an issue within one import but good practice when I ran the script importing all 10. Cannot allocate vector in R of size 11.8 Gb
Just more topics related to this: R Memory "Cannot allocate vector of size N"
Found this too but it's no help because of machine restrictions due to data privacy: https://rpubs.com/msundar/large_data_analysis
Just reading up on general good practices: http://adv-r.had.co.nz/memory.html http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html