How can I read multiple csv files into R at once and know which file the data is from?

Question

I want to read multiple csv files into R and combine them into one large table. I however need to a column that identifies which file each row came from.

Basically, every row has a unique identifying number within a file but those numbers are repeated across files. So if I bind all files into a table without knowing which file every row is from I won't have a unique identifier anymore which makes my planned analysis impossible.

What I have so far is this but this doesn't give me what file the data came from.

list_file <- list.files(pattern="*.csv") %>% lapply(read.csv,stringsAsFactors=F)
combo_data <- list.rbind(list_file)

I have about 100 files to read in so I'd really appreciate any help so I don't have to do them all individually.

score 7 · Answer 1 · answered Nov 25 '20 at 08:35

One way would be to use map_df from purrr to bind all the csv's into one with a unique column identifier.

filenames <- list.files(pattern="*.csv")

purrr::map_df(filenames, read.csv,stringsAsFactors = FALSE, .id = 'filename') %>%
  dplyr::mutate(filename = filenames[filename]) -> combo_data

Also :

combo_data <- purrr::map_df(filenames, 
              ~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(filename = .x))

In base R :

combo_data <- do.call(rbind, lapply(filenames, function(x) 
                     cbind(read.csv(x, stringsAsFactors = FALSE), filename = x)))

score 4 · Accepted Answer · answered Nov 25 '20 at 08:40

In case you want to use base R you can use

file.names <- list.files(pattern = "*.csv")

df.list <- lapply(file.names, function(file.name)
                              {
                                    df           <- read.csv(file.name)
                                    df$file.name <- file.name
                                    return(df)
                               })

df <- list.rbind(df.list)

score 1 · Answer 3 · answered May 05 '22 at 15:18

As other answer suggested, now tidyverse made things easier:

library(readr)
library(purrr)
library(dplyr)
library(stringr)

df <- fs::dir_ls(regexp = "\\.csv$") %>%
map_dfr(read_csv, id='path') %>%
mutate(thename = str_replace(path, ".tsv","")) %>%
select(-path)

How can I read multiple csv files into R at once and know which file the data is from?

3 Answers3

Linked

Related