1

Good evening,

I am attempting to load a dataset into R (~20 mil rows, 140 cols ~6.2gb on disk) using either LaF and ffbase or ff. In either case the load fails.

struct <- detect_dm_csv(file = '/scratch/proj.csv', header = TRUE)
colClasses <- struct$columns[,2]
ldat <- laf_open(struct)
data <- laf_to_ffdf(ldat)

or data <- read.csv.ffdf(file = 'proj.csv', colClasses =colClasses, header = TRUE)

It chugs along for a bit and then outputs a massive amount of items such as: 1L 1L 1L which seem to correspond to variables.

And then lists the variables like : variable_name = list() then 5: ffdfappend(x,block) 6: laf_to_ffdf(ldat)

and finally asks how I'd like to exit R.

I've tried sinking the output but it's not writing anything since the sink does not get closed (?), and the amount of nonsense it's outputting seems to break my scroll buffer.

Has anyone experienced this before?

More Info: I ran the same script in a Windows 7 virtual machine and it completed fine. By luck I was able to see the error that precedes all the nonsense and it states something about a "nonexistent physical address" which would be mmap related it would seem.

I'm going to try and recompile everything and see how it goes. Any further suggestions please let me know!

  • Isn't colClasses here an nrow vector? (rather than ncol) – mdsumner Nov 22 '14 at 04:54
  • length(colClasses) = 141 (ncol), further if i use the structure generate by detect_dm_csv directly with the LaF function it fails similarly to ff's read.csv.ffdf – Michael Chase Nov 22 '14 at 04:59
  • Can you make this into a reproducible example? It would be nice to see what caused that crash. But from what you are indicating it looks like your drive was full. I think the question is similar as here: http://stackoverflow.com/questions/25910746/r-ffdfappend-sigbus-error. So make sure you have space on your drive. ff stores files on your drive instead of putting everything in RAM. And for that you need to have space on your drive of course. –  Nov 25 '14 at 19:07

1 Answers1

1

Have you tried data.table's fread?

Can you test:

library(data.table)
data <- fread(file = '/scratch/proj.csv', verbose=TRUE)

I have files that are of similar size and using fread everything runs smoothly.

Nikos
  • 3,267
  • 1
  • 25
  • 32