1

I am trying to read a series of .tsv output files into the R environment into the same dataframe. However, each output file is in a different sub-folder but has the same name.

Example of file.tsv from Folder1

  A B   C
  1 1   40
  2 1   45

Example Paths:

A:/output/Folder1/file.tsv

A:/output/Folder2/file.tsv

A:/output/Folder3/file.tsv

The folder name holds important information, and I would like to preserve it in the aggregated dataframe.

Example:

  A B   C   folder 
  1 1   40  Folder1 
  2 1   45  Folder1 
  3 1   50  Folder2
  4 1   55  Folder2
  5 1   60  Folder3
  6 1   65  Folder3

I have found answers that allow you to read in and append .tsv files from different folders (Read several files in different directories in r), but I am getting stuck on how to add a column with the folder name. I have 395 unique folders so making the column by hand I only want to do as a last resort.

Thank you for any insight you might have!

Charlotte
  • 13
  • 2

1 Answers1

1

An adaptation of my answer here. You need the following steps:

# load the needed packages
library(dplyr)

# create a list of the filenames with the full path
file.list <- list.files(pattern='*.tsv', recursive = TRUE, full.names = TRUE)

# read the files into a list
# using 'simplify=FALSE' makes sure the full paths are used as names in the list
df.list <- sapply(file.list, read.delim, simplify=FALSE)

# bind the dataframes in the list together with 'bind_rows' from the dplyr-package
# use to replace the full path name with the folder name
df <- bind_rows(df.list, .id = "folder") %>%
  mutate(folder = sub('.*/(.*)/.*$', '\\1', folder))
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Thank you so much for your help! The only change I had to make to this was to denote that the files were tab separated rather than comma separated which I did as so, `df.list <- sapply(file.list, read.csv, sep = '\t', simplify=FALSE). Otherwise, the files didn't get read correctly. – Charlotte May 21 '18 at 20:44
  • @Charlotte That works indeed. I see that I forgot to change `read.csv` to `read.delim` (which has `sep = "\t"` as default). – Jaap May 22 '18 at 06:45