46

Suppose we have files file1.csv, file2.csv, ... , and file100.csv in directory C:\R\Data and we want to read them all into separate data frames (e.g. file1, file2, ... , and file100).

The reason for this is that, despite having similar names they have different file structures, so it is not that useful to have them in a list.

I could use lapply but that returns a single list containing 100 data frames. Instead I want these data frames in the Global Environment.

How do I read multiple files directly into the global environment? Or, alternatively, How do I unpack the contents of a list of data frames into it?

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Fred
  • 1,833
  • 3
  • 24
  • 29
  • 5
    @Roman Luštrik Please see comment to @hadley below. Note I did not ask "What is the best way to read X number of files into R?". My question is more specific for a reason. I guess I should not have said I wanted to read 100 files (simply trying to be general) but 8 different files with similar names. But there are too many people here off on their high horse. – Fred Mar 16 '11 at 13:22
  • 1
    For people who happen upon this post and have a set of 100 identically (or nearly so) structured files, your best bet is to read the data into a named list as hadley and joran explain below. For an additional perspective, take a look at Gregor's response to [this post](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) as to why this is beneficial. – lmo Jul 28 '19 at 18:53
  • If they are completely different structures, how is reading them in a loop helpful? They need to be handled by separate code anyway! — Just read them separately. – Konrad Rudolph Jan 24 '23 at 15:13

11 Answers11

36

Thank you all for replying.

For completeness here is my final answer for loading any number of (tab) delimited files, in this case with 6 columns of data each where column 1 is characters, 2 is factor, and remainder numeric:

##Read files named xyz1111.csv, xyz2222.csv, etc.
filenames <- list.files(path="../Data/original_data",
    pattern="xyz+.*csv")

##Create list of data frame names without the ".csv" part 
names <-substr(filenames,1,7)

###Load all files
for(i in names){
    filepath <- file.path("../Data/original_data/",paste(i,".csv",sep=""))
    assign(i, read.delim(filepath,
    colClasses=c("character","factor",rep("numeric",4)),
    sep = "\t"))
}
Mus
  • 7,290
  • 24
  • 86
  • 130
Fred
  • 1,833
  • 3
  • 24
  • 29
31

Quick draft, untested:

  1. Use list.files() aka dir() to dynamically generate your list of files.

  2. This returns a vector, just run along the vector in a for loop.

  3. Read the i-th file, then use assign() to place the content into a new variable file_i

That should do the trick for you.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • @Dirk Eddelbuettel Thanks, that works. Indeed that is what I tried to do originally but using `i <- read.csv(...)` inside the loop instead of `assign(i,read.csv(...))`. Why doesn't the former work? – Fred Mar 16 '11 at 01:19
  • 1
    Local scope versus global environment. You could try `i <<- read.csv(...)` as well. – Dirk Eddelbuettel Mar 16 '11 at 01:21
  • @Dirk Eddelbuettel Many thanks, final question: Had I used `lapply` and dumped everything inside a list, how would I "unpack it"? I ask because `lapply` is much faster and I dislike loops. – Fred Mar 16 '11 at 01:25
  • 3
    Prove that `lapply` is faster in reading N files. Moreover, if *you* dislike loops the burden is on you to read up on the `*apply` family. And again, these days they are *not* generally faster. – Dirk Eddelbuettel Mar 16 '11 at 01:31
  • 1
    @Dirk Eddelbuettel Thanks. I was always told to avoid loops in `R`. Vectorization and all that. But maybe is all lore... Easy to find out with `system.time()` I suppose. – Fred Mar 16 '11 at 01:50
  • Precisely. Profiling beats old wife's tales. – Dirk Eddelbuettel Mar 16 '11 at 01:52
  • 9
    Yowser, assign and <<- in the same answer! Has someone hijacked Dirk's account? – mdsumner Mar 16 '11 at 03:09
17

Use assign with a character variable containing the desired name of your data frame.

for(i in 1:100)
{
   oname = paste("file", i, sep="")
   assign(oname, read.csv(paste(oname, ".txt", sep="")))
}
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
15

This answer is intended as a more useful complement to Hadley's answer.

While the OP specifically wanted each file read into their R workspace as a separate object, many other people naively landing on this question may think that that's what they want to do, when in fact they'd be better off reading the files into a single list of data frames.

So for the record, here's how you might do that.

#If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files("path/to/files")

#Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files,read.csv,...)

#Set the name of each list element to its
# respective file name. Note full.names = FALSE to
# get only the file names, not the full path.
names(all_csv) <- gsub(".csv","",
                       list.files("path/to/files",full.names = FALSE),
                       fixed = TRUE)

Now any of the files can be referred to by my_files[["filename"]], which really isn't much worse that just having separate filename variables in your workspace, and often it is much more convenient.

Steph Locke
  • 5,951
  • 4
  • 39
  • 77
joran
  • 169,992
  • 32
  • 429
  • 468
8

Here is a way to unpack a list of data.frames using just lapply

filenames <- list.files(path="../Data/original_data",
                        pattern="xyz+.*csv")

filelist <- lappy(filenames, read.csv)

#if necessary, assign names to data.frames
names(filelist) <- c("one","two","three")

#note the invisible function keeps lapply from spitting out the data.frames to the console

invisible(lapply(names(filelist), function(x) assign(x,filelist[[x]],envir=.GlobalEnv)))
joran
  • 169,992
  • 32
  • 429
  • 468
Robert
  • 838
  • 6
  • 8
  • 1
    You can "automate" the naming with `paste0("sheet_",1:length(filelist))`. – NelsonGon Apr 10 '19 at 17:07
  • When I use this method, the space between my column names is replaced by a ".". For example column "Warehouse Code" becomes "Warehouse.Code". Do you know how to keep the column format unchanged? – jb12n Jun 12 '19 at 15:07
  • Hi! I'm wondering, how i can pass to `read.csv` function a number of parameters instead of calling it with the default ones? E.g. `read.csv(as.is = T, header = T, comment.char = "")` . – Denis May 04 '20 at 19:32
6

Reading all the CSV files from a folder and creating vactors same as the file names:

setwd("your path to folder where CSVs are")

filenames <- gsub("\\.csv$","", list.files(pattern="\\.csv$"))

for(i in filenames){
  assign(i, read.csv(paste(i, ".csv", sep="")))
}
Manoj Kumar
  • 5,273
  • 1
  • 26
  • 33
3

A simple way to access the elements of a list from the global environment is to attach the list. Note that this actually creates a new environment on the search path and copies the elements of your list into it, so you may want to remove the original list after attaching to prevent having two potentially different copies floating around.

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
1

I want to update the answer given by Joran:

#If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files(path="set your directory here", full.names=TRUE)
#full.names=TRUE is important to be added here

#Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files, read.csv)

#Set the name of each list element to its
# respective file name. Note full.names = FALSE to
# get only the file names, not the full path.
names(all_csv) <- gsub(".csv","",list.files("copy and paste your directory here",full.names = FALSE),fixed = TRUE)

#Now you can create a dataset based on each filename
df <- as.data.frame(all_csv$nameofyourfilename)
Edwin
  • 26
  • 1
0

a simplified version, assuming your csv files are in the working directory:

listcsv <- list.files(pattern= "*.csv") #creates list from csv files
names <- substr(listcsv,1,nchar(listcsv)-4) #creates list of file names, no .csv
for (k in 1:length(listcsv)){
  assign(names[[k]] , read.csv(listcsv[k]))
}
#cycles through the names and assigns each relevant dataframe using read.csv
Stefano Verugi
  • 101
  • 1
  • 5
-1
#copy all the files you want to read in R in your working directory
a <- dir()
#using lapply to remove the".csv" from the filename 
for(i in a){
list1 <- lapply(a, function(x) gsub(".csv","",x))
}
#Final step 
for(i in list1){
filepath <- file.path("../Data/original_data/..",paste(i,".csv",sep=""))
assign(i, read.csv(filepath))
}
-1

Use list.files and map_dfr to read many csv files

df <- list.files(data_folder, full.names = TRUE) %>%
    map_dfr(read_csv)

Reproducible example

First write sample csv files to a temporary directory. It's more complicated than I thought it would be.

library(dplyr)
library(purrr)
library(purrrlyr)
library(readr)
data_folder <- file.path(tempdir(), "iris")
dir.create(data_folder)
iris %>%
    # Keep the Species column in the output
    # Create a new column that will be used as the grouping variable
    mutate(species_group = Species) %>%
    group_by(species_group) %>%
    nest() %>%
    by_row(~write.csv(.$data,
                      file = file.path(data_folder, paste0(.$species_group, ".csv")),
                      row.names = FALSE))

Read these csv files into one data frame. Note the Species column has to be present in the csv files, otherwise we would loose that information.

iris_csv <- list.files(data_folder, full.names = TRUE) %>%
    map_dfr(read_csv)
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
  • The question asked how to read all the files as separate data frames, not one dataframe. – Jas Jan 24 '23 at 03:17
  • 1
    @jas thank you. It is correct. I moved this answer to a new question and answer called [How to read multiple csv files into a single data frame in R](https://stackoverflow.com/questions/75222347/how-to-read-multiple-csv-files-into-a-single-data-frame-in-r/75222348#75222348) – Paul Rougieux Jan 24 '23 at 13:42