1

I have 80 separate .csv files that have the same columns and headers that I was able to import and rbind as one dataframe using the following commands:

 file_names <- dir("~/Desktop/data") 
 df <- do.call(rbind,lapply(file_names,read.csv))

But I would like to add a new variable ("name") that identifies from which .csv file each observation came from. So for example, this variable "name" would be "NY" for all the observations from the 'NY.csv' file and "DC" for all observations from the 'DC.csv' file, etc... Is there any way to do this without adding this new column manually on each .csv? Thanks!

Agustín Indaco
  • 550
  • 5
  • 17

3 Answers3

2

This should do it:

file_names <- dir("~/Desktop/data") 
df <- do.call(rbind, lapply(file_names, function(x) cbind(read.csv(x), name=strsplit(x,'\\.')[[1]][1])))
mpjdem
  • 1,504
  • 9
  • 14
  • Hey @mpjdem I like where you are going with this, but I get the following error. Any idea why/how to solve it? Thanks! "Error in !header : invalid argument type" – Agustín Indaco Dec 09 '16 at 21:02
  • Does it help explicitly set `header=TRUE` in the `read.csv()` call? (supposing you do have a header; otherwise `header=FALSE`) – mpjdem Dec 09 '16 at 21:06
  • No, I still get the same error. I have a header; can't figure it out. @mpjdem – Agustín Indaco Dec 09 '16 at 21:53
1

With readr >= 2.0 just add the id option:

library(readr)
read_csv(file_names, id = "name")

If you would like to remove the csv at the end:

read_csv(file_names, id = "name") %>%
   mutate(name = str_remove_all(name, ".csv")

See this thread for more options.

Rodrigo Zepeda
  • 1,935
  • 2
  • 15
  • 25
0

Use the idcol argument from data.table's rbindlist() function:

# get a vector of all file names
myfiles <- list.files("path/to/directory/")

# loop over files names, reading in and saving each data.frame as an element in a list
n <- length(myfiles )
datalist <- vector(mode="list", length=n)
for(i in 1:n) {
    cat("importing file", i, ":", myfiles[i], "\n")
    datalist[[i]] <- read.csv(myfiles[i])
}

# assign list elements the file names
names(datalist) <- myfiles 

# combine all data.frames in datalist, use idcol argument to assign original file name
all_data <- data.table::rbindlist(datalist, idcol=TRUE)
DanY
  • 5,920
  • 1
  • 13
  • 33