0

i have a bunch of files store in folders, each folder has a gz file containing a txt file. I'm trying to read all the data into a list of data frames so that i could use the join function and get one data frame of all the data. all the txt files looks like that (only much longer):

ENSG00000242268.2 4.121822e-01
ENSG00000270112.3 6.127670e-02
ENSG00000167578.15 4.284772e+00

I tried this code:

files <- list.files(path= getwd(),full.names = TRUE)
transcriptome_profiling <- list()
for (i in length(files)) {
  gzfiles <- list.files(path = files[i],full.names = TRUE)
  readgzf <- gzfile(description = gzfiles)
  transcriptome_profiling[[i]] <- read.table(file = readgzf)
}

in this case only the last object in the list contains data the rest are NULL

i also tried this code:

 files <- list.files(path= getwd(), full.names = TRUE)
 #reading all the gz file from within the folder in the root
 data <-lapply(files, function(x) {
     transcriptome_profiling <-data.frame(read.delim(file = gzfile(description = list.files(path = x,full.names = TRUE, pattern = "\\.gz$"))))
 })

but i only get a list of list..

any ideas on how to get a list of data frames to use with the join function?

MrFlick
  • 195,160
  • 17
  • 277
  • 295
sahar
  • 7
  • 4
  • Possible helpful: https://stackoverflow.com/questions/23190280/whats-wrong-with-my-function-to-load-multiple-csv-files-into-single-dataframe (you can only pass one file at a time to `read.table` or `read.delim`. You need to loop/map/apply if you need to read multiple files from a folder. – MrFlick Sep 21 '18 at 19:39
  • should have mention there is only one file in each gz folord so in each iteration of the loop read.table only read one file. – sahar Sep 21 '18 at 21:51

1 Answers1

0

There's a slight error in your first attempt :

for (i in length(files)) # i is always 1
for (i in 1:length(files)) # i changes based on number of files

With the corrected first attempt or the second, once you have the list, you can name it to indicate the file names :

names(transcriptome_profiling) = files
transcriptome_profiling_df = data.table::rbindlist(transcriptome_profiling, idcol = "filename")

If you wanted each data to be a column, you can use tidyr::spread or instead of rbindlist above, maybe use dplyr::bind_cols.

eg-r
  • 46
  • 3