Subset multiple dataframes in a loop in R

Question

I am trying to drop columns from over 20 data frames that I have imported. However, I'm getting errors when I try to iterate through all of these files. I'm able to drop when I hard code the individual file name, but as soon as I try to loop through all of the files, I have errors. Here's the code:

path <- "C://Home/Data/"
files <- list.files(path=path, pattern="^.file*\\.csv$")

for(i in 1:length(files))
{
  perpos <- which(strsplit(files[i], "")[[1]]==".")
  assign(
    gsub(" ","",substr(files[i], 1, perpos-1)), 
    read.csv(paste(path,files[i],sep="")))
}

mycols <- c("test," "trialruns," "practice")

`file01` = `file01`[,!(names(`file01`) %in% mycols)]

So, the above will work and drop those three columns from file01. However, I can't iterate through files02 to files20 and drop the columns from all of them. Any ideas? Thank you so much!

Use lapply, keep all dataframes in a list. See this [post](http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r) for example. — zx8754, Sep 25 '16 at 19:46
Welcome to SO! Instead of referencing a file on your computer in the R tag we ask that you provide a reproducible example (you could use a builtin data set or share your code via `dput()`). — Hack-R, Sep 25 '16 at 19:51
i think using `setwd(path)` and `list.files(path = ".", pattern = "^.file.\\csv$", full.names = TRUE)` would make your life easier too — Nate, Sep 25 '16 at 19:52
Thanks so much for the responses. @zx8754, I'm trying to do: lapply(files, subset, select = mycols), but am getting an error that "argument "subset" is missing, with no default." Any idea what the problem might be? — jayz323, Sep 25 '16 at 20:23
Maybe more like lapply(files, function(x, selcols){x[,selcols]}, mycols) — Jason, Sep 25 '16 at 20:51
@Jason: when I tried this, I got: Error in x[, selcols] : incorrect number of dimensions — jayz323, Sep 25 '16 at 21:02

score 0 · Answer 1 · answered Sep 25 '16 at 20:55

As @zx8754 mentions, consider lapply() maintaining all dataframes in one compiled list instead of multiple objects in your environment (but below also includes how to output individual dfs from list):

path <- "C://Home/Data/"
files <- list.files(path=path, pattern="^.file*\\.csv$")
mycols <- c("test," "trialruns," "practice")

# READ IN ALL FILES AND SUBSET COLUMNS
dfList <- lapply(files, function(f) {  
   read.csv(paste0(path, f))[mycols]
})

# SET NAMES TO EACH DF ELEMENT
dfList <- setNames(dfList, gsub(".csv", "", files))

# IN CASE YOU REALLY NEED INDIVIDUAL DFs
list2env(dfList, envir=.GlobalEnv)

# IN CASE YOU NEED TO APPEND ALL DFs
finaldf <- do.call(rbind, dfList)

# TO RETRIEVE FIRST DF
dfList[[1]]  # OR dfList$file01

Thanks so much for the suggestion. When trying to run this code, I'm getting: "Error: unexpected symbol in: "# SET NAMES TO EACH DF ELEMENT dfList" — jayz323, Sep 25 '16 at 21:13
Not sure why you are getting that as that is a comment. Carefully check your implementation compared to this example. Do note: `mycols` moved to the top. — Parfait, Sep 26 '16 at 01:26

Subset multiple dataframes in a loop in R

1 Answers1