1

I want to create a function that accepts a directory and then reads 100 csv files in that directory. I then want to store, in a data frame ("df" in my code) the file Id for each file and the number of rows that had no missing data. So my data frame will have two variables (columns) and 100 rows when done.

Here is my code. This works but the only row in my data frame, when it completes, is data on the last file processed, instead of all files. It appears my data frame is getting overwritten each time through the loop.

myFunction <- function(directory) {
    df <-- data.frame(Id=integer(),nobs=integer())

    for(i in 1:100) {
        fileName <- sprintf("%03d.csv",i)
        rowData <- read.csv(paste(directory,fileName,sep=""),header=T)
        completeCases = rowData[complete.cases(rowData),]

        df <- rbind(c(i,length(completeCases[[1]])))
    }

    df
}
Randy Minder
  • 47,200
  • 49
  • 204
  • 358
  • 2
    It seems you forgot to use `df` inside `rbind` -- `rbind(df, c(...))`. Note, though, that instead of allocating new space for `df` in each iteration (and copying `df`), you could allocate a `mylist = vector("list", 100)` and use `mylist[[i]] = c(i, length())` in each iteration; lastly use `do.call(rbind, mylist)`. The general pattern is similar to [this](http://stackoverflow.com/questions/23190280/issue-in-loading-multiple-csv-files-into-single-dataframe-in-r-using-rbind). Also, if each iteration returns a known `length` pre-allocate a "data.frame" with the proper amount of rows and fill. – alexis_laz Jun 29 '16 at 20:26
  • Yes, much better to use `dir` or `list.files` and then `lapply`. – joran Jun 29 '16 at 20:29
  • @alexis_laz - This worked. If you create this as an answer, I'll accept it. – Randy Minder Jun 29 '16 at 20:29
  • @RandyMinder : These: [here](http://stackoverflow.com/questions/3642535/creating-an-r-dataframe-row-by-row) and [here](http://stackoverflow.com/questions/4034059/iteratively-constructed-dataframe-in-r) seem more extensive – alexis_laz Jun 29 '16 at 20:33

0 Answers0