0

I am trying to read a file representing a numeric matrix with 4.5e5 rows and 2e3 columns. First line is the header with ncol+1 words, while each row begins with a row name. In txt format it is around 17G in size. I tried using:

read.table(fname, header=TRUE)

but the operation ate all 64G of RAM available. I assume it loaded it in a wrong structure.

Usually people discuss speed, is there a way to import it so it fits properly? Performance is not a primary issue.

EDIT: I managed to read it with read.table:

colclasses = c("character",rep("numeric",2000))
betas = read.table(beta_fname, header=TRUE, colClasses=colclasses, row.names=1)

But documentation still recommends "scan" for memory usage. What would be the "scan" alternative?

Cindy Almighty
  • 903
  • 8
  • 20
  • this is likely a duplicate of https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes or answer can be found there – Sirius Mar 22 '21 at 17:53

1 Answers1

0

There are several things you might try. Google about reading large files and they might point you to using 'fread' in data.table. You can also try 'read_delim_chunked' that might help. Also break the file into smaller pieces, read each one in, write out an RDS file. When complete you might be able to read in the RDS files and combine using less space.