Reading large numeric TSV file into memory in R

Question

I am trying to read a file representing a numeric matrix with 4.5e5 rows and 2e3 columns. First line is the header with ncol+1 words, while each row begins with a row name. In txt format it is around 17G in size. I tried using:

read.table(fname, header=TRUE)

but the operation ate all 64G of RAM available. I assume it loaded it in a wrong structure.

Usually people discuss speed, is there a way to import it so it fits properly? Performance is not a primary issue.

EDIT: I managed to read it with read.table:

colclasses = c("character",rep("numeric",2000))
betas = read.table(beta_fname, header=TRUE, colClasses=colclasses, row.names=1)

But documentation still recommends "scan" for memory usage. What would be the "scan" alternative?

this is likely a duplicate of https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes or answer can be found there — Sirius, Mar 22 '21 at 17:53

score 0 · Answer 1 · answered Mar 22 '21 at 19:51

0

There are several things you might try. Google about reading large files and they might point you to using 'fread' in data.table. You can also try 'read_delim_chunked' that might help. Also break the file into smaller pieces, read each one in, write out an RDS file. When complete you might be able to read in the RDS files and combine using less space.

answered Mar 22 '21 at 19:51

Retired Data Munger

1,395
8
9

also the `vroom` package in R is handy for reading large datasets: https://vroom.r-lib.org/ – brian avery Mar 22 '21 at 22:45

Reading large numeric TSV file into memory in R

1 Answers1