Merging only parts of csv file and add a coloumn with the csv file name

Question

I wanted to merge cvs files stored in the work directory and its subfolder. This piece of code runs smoothly:

csv_files <- dir(pattern='.*[.]csv', recursive = T)
list.files()

my_data_frame <- do.call(rbind,lapply(csv_files,read.csv))`

So far so good. I now want to add a coloumn containing the names of these csv files. Furthermore, I want to extract only pieces of these cvs files, let's say from the 5th row to the 10th one.

Thanks for your precious help!

Possible duplicate: https://stackoverflow.com/questions/46299777/add-filename-column-to-table-as-multiple-files-are-read-and-bound — MrFlick, Mar 21 '18 at 20:39
Possible duplicate: https://stackoverflow.com/questions/5186570/when-importing-csv-into-r-how-to-generate-column-with-name-of-the-csv — MrFlick, Mar 21 '18 at 20:41
I've already tried the solutions proposed in the possible duplicates. They easily do not fit with my specific issue. — Gianluca, Mar 21 '18 at 20:45
If you're having a hard time understanding the solutions at those other questions, another option would be to use `dplyr::bind_rows()` instead of `rbind` and wrap `lapply()` in `setNames()` to created a named list instead. Pulling out the 5th-10th rows can be done by subsequently `dplyr::group_by()` and then `slice()`. — joran, Mar 21 '18 at 20:55
Would you please write down the code which integrates all these suggestion? That would be very helpful! Thanks! — Gianluca, Mar 21 '18 at 21:12

score 1 · Accepted Answer · answered Mar 22 '18 at 00:59

1

You could simply replace read.csv in the lapply call with your own function that does the subset and adds the new column. E.g.,

csv_files <- dir(pattern='.*[.]csv', recursive = T)
list.files()

#function to make df from each csv 
my_read_csv <- function(x) {
  dfx <- read.csv(x)[5:10,] #or any other subset
  dfx$fname <- basename(x) #add new column
  return(dfx)
}

my_data_frame <- do.call(rbind,lapply(csv_files,my_read_csv))

answered Mar 22 '18 at 00:59

Chris Holbrook

2,531
1
17
30

1

Function can be a one liner: `transform(read.csv(x)[5:10,], fname = basename(x))`. – Parfait Mar 22 '18 at 01:47

Merging only parts of csv file and add a coloumn with the csv file name

1 Answers1