2

I want to read multiple csv files into R and combine them into one large table. I however need to a column that identifies which file each row came from.

Basically, every row has a unique identifying number within a file but those numbers are repeated across files. So if I bind all files into a table without knowing which file every row is from I won't have a unique identifier anymore which makes my planned analysis impossible.

What I have so far is this but this doesn't give me what file the data came from.

list_file <- list.files(pattern="*.csv") %>% lapply(read.csv,stringsAsFactors=F)
combo_data <- list.rbind(list_file)

I have about 100 files to read in so I'd really appreciate any help so I don't have to do them all individually.

saxonryan
  • 67
  • 1
  • 7

3 Answers3

7

One way would be to use map_df from purrr to bind all the csv's into one with a unique column identifier.

filenames <- list.files(pattern="*.csv")

purrr::map_df(filenames, read.csv,stringsAsFactors = FALSE, .id = 'filename') %>%
  dplyr::mutate(filename = filenames[filename]) -> combo_data

Also :

combo_data <- purrr::map_df(filenames, 
              ~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(filename = .x))

In base R :

combo_data <- do.call(rbind, lapply(filenames, function(x) 
                     cbind(read.csv(x, stringsAsFactors = FALSE), filename = x)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
4

In case you want to use base R you can use

file.names <- list.files(pattern = "*.csv")

df.list <- lapply(file.names, function(file.name)
                              {
                                    df           <- read.csv(file.name)
                                    df$file.name <- file.name
                                    return(df)
                               })

df <- list.rbind(df.list)
MacOS
  • 1,149
  • 1
  • 7
  • 14
1

As other answer suggested, now tidyverse made things easier:

library(readr)
library(purrr)
library(dplyr)
library(stringr)

df <- fs::dir_ls(regexp = "\\.csv$") %>%
map_dfr(read_csv, id='path') %>%
mutate(thename = str_replace(path, ".tsv","")) %>%
select(-path)
Andrés Parada
  • 319
  • 7
  • 21