0

I am trying to use FF to read a large CSV file in R.

I first read the first 10 rows of the code with csv.read to make sure I'm not doing anything stupid:

trainFileName = "./TrainingSet/SplitFiles/7_train_data.csv"
trainSet <- read.csv(trainFileName, header=TRUE, nrows=10)
length(trainSet[1,])
length(trainSet[,1])

This tells me:

> trainFileName = "./TrainingSet/SplitFiles/7_train_data.csv"
> trainSet <- read.csv(trainFileName, header=TRUE, nrows=10)
> length(trainSet[1,])
[1] 4505
> length(trainSet[,1])
[1] 10

So far so good. Now I try to repeat this feat with FF:

trainSet <- read.csv.ffdf(file = trainFileName, header = TRUE, nrows = 10, VERBOSE = TRUE)

And here we fail with:

read.table.ffdf 1..10 (10)  csv-read=0.552sec
Error in if (dfile ==         getOption("fftempdir")) finalizer <- "delete" else finalizer <- "close" : argument is of length zero
Error in setwd(cwd) : character argument expected

I can't find any more info on this error anywhere, and I can't see how I can do anything simpler, so before I delve into the FF source, does anyone have any ideas?

I've tried loading the whole file instead of the first 10 rows, specifying column data types and always the same error.

Thanks in advance.

Stephan
  • 99
  • 1
  • 9
  • The problem could be [that you have so many columns](https://stat.ethz.ch/pipermail/r-sig-hpc/2010-April/000606.html). – Roland Sep 05 '12 at 14:55

1 Answers1

0

yes, you have too many columns. In ff, each column is a file. You can not open more files than your filesystem can open at the same time

To have a look where ff will fail if you have too many files open run this:

require(ff)
x <- list()
for(i in 1:100000){
  print(i)
  x[[i]] <- ff(rnorm(10))
  open(x[[i]] )
}

For me, this failed at 1022 open files but I still had a few other open as well.