Improove/Avoid for-Loop in R

Question

I have several files of measured data, which I want to open automatically, take some values out and put them all together in one dataframe.

First I search for the filenames, open them one by one (in a for loop) and set them together. The code works fine. But as there are a lot of files, it takes way too long. At the moment I can’t think ou any other way to do this…My question is, is there an option to fasten up the process? Maybe without using loops? Escpecially avoiding the second loop would improve the performance.

I tried to make a minimal example of the code. Some lines (such as data_s) dont make a lot of sense in this example, but in reality they do ;-)

all.files     <- list.files(recursive = T)   
df            <- data.frame(matrix(, nrow=1000, ncol=242))

for (i in 1:length(all.files) {
    Data      <- read.table(all.files[i]), header=F)    
    name      <- Data[i,2] 
    data_s    <- i+6

    for (k in 1:240){
             df[data_s+k,k+2]     <- Data[24+k,3]
    }

assign(name,df)
rm(name,df)
}

thats the structure of "Data":

thats how my final file ("df") should look like:

thanks a lot for your help!

Probably reading all your files into a list of data.frames, `mydfList ,- lapply(..., read.csv)` and then doing some cleaning in a second pass with `lapply` or extend the function in the first. Finally, combine the list into a single data.frame using `do.call(rbind, mydfList)` or similar. For example, see [this post](https://stackoverflow.com/questions/5758084/loop-in-r-loading-files) on reading files into a list of data.frames. — lmo, Oct 03 '17 at 16:04
Maybe I'm missing something, but it looks like you are overwriting your data in `df` in the second loop, as you shift `i`... — juan, Oct 03 '17 at 16:18
Thank for the help! I allways forget about lapply() this one actually helped in combination with the answer from guscht! — nvw, Oct 05 '17 at 09:31

guscht · Accepted Answer · 2017-10-03T16:21:39.577

I would use the data.table-package and its fread function. Its much faster than read.table and the syntax is generally nicer than the data.frame syntax. Your problem should be solved with something like this:

library(dplyr) # for the left_join
library(data.table) # data.table for fread and nicer syntax
final <- data.table(dateandtime = as.character())
for (file in list.files(recursive = T)) {
   new <- fread(file, stringsAsFactors = F)
   final <- data.table(full_join(final, new, by = "dateandtime"))
}

EDIT1: Changed "left_join" to "full_join" to account for the case were the observation-"dateandtimes" between files are different.

EDIT2: Instantiated the "final"-data.table with a column "dateandtime", to make the join work on the first element.

sorry I was a bit too fast. My problem is/was, that I have zip-files which I did open with something like unz(zip.file1,all.files1inzipfile1). so this one didnt really worked with your option. After trying a round I decided just to unzip all the files first...Thanks for your help! — nvw, Oct 05 '17 at 09:20

score 0 · Answer 2 · answered Oct 03 '17 at 16:14

First I created a data as you described.

df <- diag(nrow = 10,ncol = 10)
df[df == 0] <- NA
df <- as.data.frame(df)
df

df$X <- 7


library(reshape2)

than I used the function melt() of the package reshape2

melt(df,id.vars = "X",na.rm = TRUE)

I hope this helps.

Improove/Avoid for-Loop in R

2 Answers2