2

I have a simple R code which looks like this:

for(B in 1:length(Files)){
    InputDaten[,B]<-read.table(Files[B],header=FALSE,dec=".",skip=12,sep =     ",",colClasses=c("numeric"))
}

so I read 1.39GB of files into the memory and would like to process them. However, this takes about an hour to read. When I watch the memory space which is occupied it increases only every 10 minutes. The last two minutes result in a linear increase in the memory space in dependence of time. Why might that be? Can I make it faster?

Edit 1

InputDaten<-data.frame(c(1:15360),444)

This is how i initialised InputDaten

I used fread now, the result looks the same. Here is a screenshot of the memory usage when i started fread, the memory usage doesn't increase at all for a while. (fread started approximately at the middle of the timeframe)

http://pic-hoster.net/upload/57790/Unbenannt.png

Community
  • 1
  • 1
  • 7
    In my experience, `fread` from the `data.table` package is super fast. It's what I used for reading large files. – rrs Jun 23 '14 at 15:09
  • Have a wee look at [mnel's post](http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r/1820610#1820610) – user20650 Jun 23 '14 at 15:13
  • Also, it is unclear from your example code how you've initialized `InputDaten`, the detail of which might indicate a large amount of copying each time you read in a new file. – joran Jun 23 '14 at 15:42
  • Initializing it that way makes a data.frame of 2 columns. That means every time B > 3 you have to copy the entire object (which takes longer as it grows) – Señor O Jun 23 '14 at 16:07
  • Check the speed of just the read.table line without assigning anything – Señor O Jun 23 '14 at 16:07
  • Only read.table or fread takes a few minutes.... i have tried InputDaten<-as.data.frame(matrix (nrow = 15360, ncol = length(Files))) for the Initializing the behaviour with the memory usage is the same. – user3767945 Jun 23 '14 at 16:22
  • 2
    This has nothing to do with the speed of `read.table`. First tip: `InputDaten[[B]] <-`. You should use `lapply(files, read.table)`. Plus you have two of the default arguments listed in the function call. – Rich Scriven Jun 23 '14 at 16:37
  • I would suggest reading the "Memory Usage" section in `?read.table` – Rich Scriven Jun 23 '14 at 17:15
  • lapply was the solution! Thanks Richard! – user3767945 Jun 23 '14 at 20:22

0 Answers0