Iterate over a subset of a vector using indices

Question

I can iterate over all the files in a directory. However, I want to iterate over certain files instead of all the files. I want to use indices.

I want to do this by using a file_id vector. Each vector element would be index in "files".

For example: I have 500 total files in a directory, and I only want to iterate three files which are files 2,4,15. I generated a vector called file_id = c(2, 4, 15). Now, how can I iterate over only these files or indices, such as files[2], files[4], files[15] which will get data only from files 2, 4, and 15, instead of all 500 files.

#get a list of all the files in directory.
files <- list.files(directory, full.names = TRUE) 

#iterate over all the files in directory, and get file data
for (item in files){
    filedata <- read.csv(item)
}

#What I want to do is only iterate over following files indicated in file_id vector. That will open files 2,4, and 15 nothing else.
file_id = c(2, 4, 31)

`files <- list.files(directory, full.names = TRUE)[c(1,4,15)]` and then use your `loop` — Jilber Urbina, Jan 18 '14 at 16:26
But you should be prepared for the files you're interested in to change order in `list.files`. If you explain a little bit more about your problem, we could help you find a more robust approach. — Peyton, Jan 18 '14 at 16:27

score 4 · Answer 1 · answered Jan 18 '14 at 16:52

When you iterate inside a for loop, the syntax is:

for(index in SET)

where index is your iterator and SET is anything that can be converted to a vector (even a matrix or array, it'll loop over each element).

#get a list of all the files in directory.
files <- list.files(directory, full.names = TRUE) 

file_id = c(2, 4, 31)

#iterate over all the files in directory, and get file data
for (i in file_id){
    filedata <- read.csv(files[i])
}

here, you only need to modify file_id to loop over that specific files.

score 2 · Answer 2 · answered Jan 18 '14 at 16:52

Find all files then index them using the [] operator, as a normal array.

all.files <- list.files(directory, full.names = TRUE) 
file_id <- c(2, 4, 31)
for (item in all.files[file_id])
    {
    filedata <- read.csv(item)
    }

For increased readibility I would tend to avoid indexing the function call directly, as suggested in the comments.

Even better, what you should do is dump the for loop altogether and use one of the *apply function.

For instance

all.files <- list.files(directory, full.names = TRUE) 
file_id <- c(2, 4, 31)
filedata <- sapply(all.files[file_id], function(f)
    {
    read.csv(f)
    })

This will return a list with an element per file. You can then access each file's content using the [[]] operator.

For instance

filedata[[2]]

will return the content of the second file (i.e. that with id 4)

Iterate over a subset of a vector using indices

2 Answers2

Linked