0

I wanted to merge cvs files stored in the work directory and its subfolder. This piece of code runs smoothly:

csv_files <- dir(pattern='.*[.]csv', recursive = T)
list.files()

my_data_frame <- do.call(rbind,lapply(csv_files,read.csv))`

So far so good. I now want to add a coloumn containing the names of these csv files. Furthermore, I want to extract only pieces of these cvs files, let's say from the 5th row to the 10th one.

Thanks for your precious help!

Gianluca
  • 43
  • 1
  • 9
  • Possible duplicate: https://stackoverflow.com/questions/46299777/add-filename-column-to-table-as-multiple-files-are-read-and-bound – MrFlick Mar 21 '18 at 20:39
  • not cvs - csv file – Aleksandr Mar 21 '18 at 20:41
  • Possible duplicate: https://stackoverflow.com/questions/5186570/when-importing-csv-into-r-how-to-generate-column-with-name-of-the-csv – MrFlick Mar 21 '18 at 20:41
  • I've already tried the solutions proposed in the possible duplicates. They easily do not fit with my specific issue. – Gianluca Mar 21 '18 at 20:45
  • If you're having a hard time understanding the solutions at those other questions, another option would be to use `dplyr::bind_rows()` instead of `rbind` and wrap `lapply()` in `setNames()` to created a named list instead. Pulling out the 5th-10th rows can be done by subsequently `dplyr::group_by()` and then `slice()`. – joran Mar 21 '18 at 20:55
  • Would you please write down the code which integrates all these suggestion? That would be very helpful! Thanks! – Gianluca Mar 21 '18 at 21:12

1 Answers1

1

You could simply replace read.csv in the lapply call with your own function that does the subset and adds the new column. E.g.,

csv_files <- dir(pattern='.*[.]csv', recursive = T)
list.files()

#function to make df from each csv 
my_read_csv <- function(x) {
  dfx <- read.csv(x)[5:10,] #or any other subset
  dfx$fname <- basename(x) #add new column
  return(dfx)
}

my_data_frame <- do.call(rbind,lapply(csv_files,my_read_csv))
Chris Holbrook
  • 2,531
  • 1
  • 17
  • 30