0

I prepare some tidy data from dozens of messy .csv files (different numbers of columns, different names, some extra semicolons etc.). The variable important for me are - luckily - named the same way.

That's why I decided to use code to select this particular variables.

df <- list.files(path="./", pattern = ".csv",full.names = FALSE) %>%
lapply(function(x) read_csv2(x, col_select = c("variable1","variable2"))) %>% 
bind_rows

It works well, but I need also the variable with the name of a the file it comes from. The question is, how to add a variable with the source file name to the resulting data frame?

Maciej B.
  • 373
  • 1
  • 4
  • 13
  • Basically you could name your file list. Afterwards you could add the filename via the `.id` argument of `bind_rows`. For an example to do this using `map_df` see e.g. [How to add filename as a column to csv while reading & appending multiple csv's in r?](https://stackoverflow.com/questions/73645417/how-to-add-filename-as-a-column-to-csv-while-reading-appending-multiple-csvs/73645507#73645507) – stefan Sep 21 '22 at 18:08
  • There's no need for `lapply`, `read_csv2` can handle file list by itself, returns combined tibble and stores filenames in a column defined by `id`param. `list.files(path="./", pattern = ".csv", full.names = FALSE) %>% read_csv2(col_select = c("variable1","variable2"), id = "filename")` – margusl Sep 21 '22 at 18:40

0 Answers0