44

I have been wondering if anybody knows a way to create a loop that loads files/databases in R. Say i have some files like that: data1.csv, data2.csv,..., data100.csv.

In some programming languages you one can do something like this data +{ x }+ .csv the system recognizes it like datax.csv, and then you can apply the loop.

Any ideas?

Dambo
  • 3,318
  • 5
  • 30
  • 79
DonC
  • 473
  • 1
  • 5
  • 5
  • 4
    This is pretty close to [Loading many files at once](http://stackoverflow.com/questions/3764292/loading-many-files-at-once). You're just loading a different type of file. – Joshua Ulrich Apr 22 '11 at 17:28

9 Answers9

60

Sys.glob() is another possibility - it's sole purpose is globbing or wildcard expansion.

dataFiles <- lapply(Sys.glob("data*.csv"), read.csv)

That will read all the files of the form data[x].csv into list dataFiles, where [x] is nothing or anything.

[Note this is a different pattern to that in @Joshua's Answer. There, list.files() takes a regular expression, whereas Sys.glob() just uses standard wildcards; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob.]

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 2
    Is it possible to do this in a way where each item in the resulting list is named after the wildcard-captured bit? So given `"folder\*.csv"` each list item would be called `data1`, `data2` etc. I realise one of the loops below could work (with `assign()` perhaps?) but a non-loop solution feels more elegant. – mendy Jul 06 '21 at 10:17
  • This is great. I'll add that you can make these easily into one dataframe using `data.table::rbindlist(dataFiles)` – mikey Nov 03 '22 at 12:45
35

See ?list.files.

myFiles <- list.files(pattern="data.*csv")

Then you can loop over myFiles.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • 1
    I assume you meant pattern="data*.csv" ... But will see if Gavin's advice helps me out here. Yeah it did... the "." is a wildcard in regex. – IRTFM Apr 22 '11 at 20:00
  • @DWin: I meant to match a single character zero or more times. – Joshua Ulrich Apr 22 '11 at 20:04
  • Perhaps safer to use "data.*\\.csv"? – IRTFM Apr 22 '11 at 20:07
  • 1
    @DWin: I'm not sure how that would be safer. My `.*` would capture the `.` before the file extension. If you really want to be safe/explicit, you could use `"^data[[:digit:]]*\\.csv$"`. :-) – Joshua Ulrich Apr 22 '11 at 20:11
  • 2
    My thought was that "data.*csv" wouldn't require the "." to be there at all. – IRTFM Apr 22 '11 at 20:14
11

I would put all the CSV files in a directory, create a list and do a loop to read all the csv files from the directory in the list.

setwd("~/Documents/")
ldf <- list() # creates a list
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
for (k in 1:length(listcsv)){
 ldf[[k]] <- read.csv(listcsv[k])
}
str(ldf[[1]]) 
PAC
  • 5,178
  • 8
  • 38
  • 62
7

Read the headers in a file so that we can use them for replacing in merged file

library(dplyr)
library(readr)

list_file <- list.files(pattern = "*.csv") %>% 
  lapply(read.csv, stringsAsFactors=F) %>% 
   bind_rows 
5
fi <- list.files(directory_path,full.names=T)
dat <- lapply(fi,read.csv)

dat will contain the datasets in a list

ah bon
  • 9,293
  • 12
  • 65
  • 148
CDX
  • 304
  • 1
  • 3
  • 2
    that will list *all* files in `directory_path` which is not what is required. You need a `pattern` as per @Joshua's answer. – Gavin Simpson Apr 22 '11 at 18:55
2

Let's assume that your files have the file format that you mentioned in your question and that they are located in the working directory.

You can vectorise creation of the file names if they have a simple naming structure. Then apply a loading function on all the files (here I used purrr package, but you can also use lapply)

library(purrr)
c(1:100) %>% paste0("data", ., ".csv") %>% map(read.csv)
epo3
  • 2,991
  • 2
  • 33
  • 60
  • I've been using a similar chunk of code to read in multiple .csv files, but is there a way to pass arguments to the read.csv function within map? Specifically, I want to pass `strings_as_factors = F`. Is this possible without creating my own custom read.csv function? – C. Denney Jun 21 '18 at 15:27
  • yes it just returns a warning saying that there was an unused argument. – C. Denney Jun 21 '18 at 19:44
2

Here's another solution using a for loop. I like it better than the others because of its flexibility and because all dfs are directly stored in the global environment.

Assume you've already set your working directory, the algorithm will iteratively read all files and store them in the global environment with the name "datai".

list <- c(1:100)
for (i in list) {
  filename <- paste0("data", i)
  wd <- paste0("data", i, ".csv")
  assign(filename, read.csv(wd))
}
ah bon
  • 9,293
  • 12
  • 65
  • 148
Maël
  • 45,206
  • 3
  • 29
  • 67
0
  1. First, set the working directory.
  2. Find and store all the files ending with .csv.
  3. Bind all of them row-wise.

Following is the code sample:

setwd("C:/yourpath")
temp <- list.files(pattern = "*.csv")
allData <- do.call("rbind",lapply(Sys.glob(temp), read.csv))
surajs1n
  • 1,493
  • 6
  • 23
  • 34
-1

This may be helpful if you have datasets for participants as in psychology/sports/medicine etc.

setwd("C:/yourpath")

temp <- list.files(pattern = "*.sav")

#Maybe you want to unselect /delete IDs
DEL <- grep('ID(04|08|11|13|19).sav', temp)
temp2 <- temp[-DEL]

#Make a list of that contains all data
read.all <- lapply(temp2, read_sav)
#View(read.all[1])

#Option 1: put one under the next
df <- do.call("rbind", read.all)

Option 2: make something within each dataset (single IDs) e.g. get the mean of certain parts of each participant

mw_extraktion <- function(data_raw){
  data_raw <- data.frame(data_raw)
  #you may now calculate e.g. the mean for a certain variable for each ID
  ID <- data_raw$ID[1]
  data_OneID <- c(ID, Var2, Var3) #put your new variables (e.g. Means) here
} #end of function   
data_combined <- t(data.frame(sapply(read.all, mw_extraktion) ) )
SDahm
  • 474
  • 2
  • 9
  • 21