5

I'm aggregating a bunch of CSV files in R, which I have done successfully using the following code (found here):

  Tbl <- list.files(path = "./Data/CSVs/",
         pattern="*.csv", 
         full.names = T) %>% 
   map_df(~read_csv(., col_types = cols(.default = "c"))) 

I want to include the .csv filename (ideally without the file extension) as a column in Tbl. I found a solution using plyr, but I want to stick with dplyr as plyr causes glitches further down my code.

Is there any way I can add something to the above code that will tell R to include the file name in Tbl$filename?

Many thanks!

Catherine Laing
  • 475
  • 6
  • 18
  • Does this answer your question? [Add "filename" column to table as multiple files are read and bound](https://stackoverflow.com/questions/46299777/add-filename-column-to-table-as-multiple-files-are-read-and-bound) – camille Jan 23 '20 at 14:01

2 Answers2

7

Here's my solution. Let me know if this helps.

Tbl <- list.files(path = "./Data/CSVs/",
         pattern="*.csv", 
         full.names = T) %>% 
   map_df(function(x) read_csv(x, col_types = cols(.default = "c")) %>% mutate(filename=gsub(".csv","",basename(x)))) 
A Gore
  • 1,870
  • 2
  • 15
  • 26
0

It's difficult to know exactly what you want since the format of your data in .csv is unclear. But try gsub. Assuming you have list of your files in Tbl.list:

library(dplyr)

Tbl.list <- list.files(path = "./Data/CSVs/",
                       pattern="*.csv", 
                       full.names = T)

Convert to data.frame and then mutate filename subbing out ".csv" with "":

Tbl.df <-   data.frame( X1 = Tbl.list ) %>%
            mutate( filename_wo_ext = gsub( ".csv", "", X1 ) ) 

You could also try the following, but I'm not sure it'll work. (Let's assume you have Tbl.list still). Start by changing your map_df statement to add an index column:

map_df(~ read_csv(., col_types = cols(.default = "c")),
         .id="index") %>%
mutate( gsub( ".csv", "", Tbl.list[as.numeric(index)] )

The column index should contain a character vector [1...n]. The mutate statement will look in Tbl.list, grab the filename at index, and sub out ".csv" with "" .

CPak
  • 13,260
  • 3
  • 30
  • 48