0

I have N .tsv files saved in a file named "data" into my rstudio working directory and I want to find a way to import them as separated data frames at once. Below is an example when I try to do it one by one but there are too many of them and I want something faster. Also every time their total number may be different.

#read files into R
f1<-read.table(file = 'a_CompositeSources/In1B1A_WDNdb_DrugTargetInteractions_CompositeDBs_Adhesion.tsv', sep = '\t', header = TRUE)
f2<-read.table(file = 'a_CompositeSources/In1B2A_WDNdb_DrugTargetInteractions_CompositeDBs_Cy.tsv',sep = '\t', header = TRUE)

I have N .tsv files saved in a file named "data" into my rstudio working directory and I want to find a way to import them as separated data frames at once. Below is an example when I try to do it one by one but there are too many of them and I want something faster. Also every time their total number may be different.

#read files into R
f1<-read.table(file = 'a_CompositeSources/In1B1A_WDNdb_DrugTargetInteractions_CompositeDBs_Adhesion.tsv', sep = '\t', header = TRUE)
f2<-read.table(file = 'a_CompositeSources/In1B2A_WDNdb_DrugTargetInteractions_CompositeDBs_Cytochrome.tsv', sep = '\t', header = TRUE)

Based on this answer I have used :

    library(readr)
    library(dplyr)


    ##Read files named xyz1111.csv, xyz2222.csv, etc.
    filenames <- list.files(path="C:/Users/user/Documents/kate/data",
                            pattern="*.tsv")

    ##Create list of data frame names without the ".csv" part 
    names <-gsub(".tsv", "", filenames)

    ###Load all files
    for(i in names){
      filepath <- file.path("C:/Users/user/Documents/kate/data",paste(i,".tsv",sep=""))
      assign(i, read.delim(filepath,
                           colClasses=c("factor","character",rep("numeric",2)),
                           sep = "\t"))
    }

but only the 1st file is read.

CSDev
  • 3,177
  • 6
  • 19
  • 37
firmo23
  • 7,490
  • 2
  • 38
  • 114
  • Consider using **one** list of similarly structured data frames and avoid flooding your global environment with *many* objects to tediously track and recall. So just use *tbl*: `tbl[[1]]`, `tbl[[2]]`, `tbl[[3]]` or `tbl$file1.csv`, `tbl2$file2.csv`, `tbl$file3.csv` – Parfait Jul 28 '19 at 21:21
  • It might be difficult to answer this without knowing something about the files. I can do it with a contrived triple-saved `mtcars.tsv` and your `sapply(...) %>% bind_rows()` works fine. (That is the preferred method, btw; using a `for` loop with `assign` usually makes things much harder than they need to be.) – r2evans Jul 28 '19 at 21:23
  • 1
    @Parfait -- I typed the same comment, then realized (before posting) that the first 5 lines of that code block include almost exactly that. – r2evans Jul 28 '19 at 21:24
  • For above comment, `.csv` should be `.tsv`! – Parfait Jul 28 '19 at 21:27
  • firmo23, you might consider not using common functions as variable names: `names` and (since you're using `dplyr`) `tbl` are common-enough. While R is usually smart-enough to know which you want, it is not hard to contrive situations where it is not as clear ... and troubleshooting problems due to that can be unnecessarily difficult. – r2evans Jul 28 '19 at 21:28
  • They all have 4 columns the first is a factor the second is character and the other 2 numeric. The names are those that I have provided in the Q. – firmo23 Jul 28 '19 at 21:28
  • firmo23, you use a sub-directory in your first (successfully-read?) examples: `'a_CompositeSources/In1B1A...tsv'`. Then you use `"C:/Users/user/Documents/kate/data"`, notably *without* `recursive=TRUE`. How certain are you that all files you expect are included in `files` and/or `filenames`? – r2evans Jul 28 '19 at 21:30
  • filenames and names are correctly created. I can see that. – firmo23 Jul 28 '19 at 21:34
  • Based on your question, then, the fact that you can successfully read `f1` and `f2` has nothing to do with the success or failure of reading files listed in `filenames`, since they are different files. Am I mistaken? Can you successfully do `readr::read_tsv(filenames[1])` and `readr::read_tsv(filenames[2])` and `readr::read_tsv(filenames[length(filenames)])`? – r2evans Jul 28 '19 at 21:40
  • You mention *"import them as separated data frames"* but then your code includes `bind_rows()`, which will combine them. Does this mean that your first few lines of code worked, but you didn't realize you were combining all frames into a single frame? (This might make you think that only one file was read.) If you do just `tbl <- sapply(files, read_tsv, simplify=FALSE)`, does `length(tbl)` indicate you read something in for all files? – r2evans Jul 28 '19 at 21:44
  • sorry ignore the bind_rows() chunk it was there by mistake. I want separated dataframes – firmo23 Jul 28 '19 at 21:47
  • You also removed the code that I (and Parfait) suggest and that code works for me with multiple files (`sapply(filenames, readr::read_tsv, simplify = FALSE)`). So perhaps there is some form of disconnect with your `filenames`. – r2evans Jul 28 '19 at 21:57

2 Answers2

1

Here is the solution for your problem

# empty list
data = list()

###Load all files 
for(i in names){ 
filepath <- file.path("C:/Users/user/Documents/kate/data",paste(i,".tsv",sep="")) 
data[i] = read.delim(filepath, colClasses=c("factor","character",rep("numeric",2)), sep = "\t")
 }
Not_Dave
  • 491
  • 2
  • 8
1

You could try this with map():

files <- list.files(path="C:/Users/user/Documents/kate/data",
                    pattern="*.tsv") %>% 
 as_tibble() %>% 
 mutate(
    data = map(value, ~ read.delim(glue::glue("C:/Users/user/Documents/kate/data/{.x}"), colClasses=c("factor","character",rep("numeric",2)), sep = "\t"))
  )
tvdo
  • 151
  • 3