use apply file list for a function in R

Question

I've created this function to get some information from two different files

   create_distance_to_strand <- function(data, data2, nameCsv) {
      newData  <- data  %>% 
        select(seqnames, start, end, V4, seqnames, geneStart, geneEnd, geneStrand, distanceToTSS, SYMBOL, geneLength)%>%
        rename(peak = V4)
      
      joined_df <- merge(newData , closest_dist, by.x = "peak", 
                         by.y = "peak", all.x = TRUE, all.y = TRUE) %>% drop_na()
      
      write.table(joined_df , nameCsv, sep = "\t")
    
      
      return(joined_df)
    }
    
    closest_dist = read.csv("closest_distance", header = T, sep ="\t")
    annotation = read.csv("results_annotation", header = T, sep ="\t") 
    myDistanceToStrand <- create_distance_to_strand(annotation,  closest_dist, "frame.csv")

It works as expected. However, I'm trying to make it more efficient in case I'd have different "closest_dist" files, in order to apply the function to all the files. I've tried this:

    files <- list.files(pattern = "closest*")
    proof = lapply(files, create_distance_to_strand(annotation,  closest_dist, "proof.csv"))

But does not work

Error in h(simpleError(msg, call)) : 
      error in evaluating the argument 'y' in selecting a method for function 'merge': object 'closest_dist' not found

Any advice? Thank you

Related post: https://stackoverflow.com/q/14958516/680068 – zx8754 Sep 24 '21 at 11:47 — zx8754, Sep 24 '21 at 11:47

Ronak Shah · Accepted Answer · 2021-09-24T11:46:08.757

1

Since you have different closest_dist files which are saved in files, you can use lapply to pass them one by one. We can use an anonymous function here -

files <- list.files(pattern = "closest*")
proof = lapply(files, function(x) create_distance_to_strand(annotation,  x, "proof.csv"))

To have a separate output file you may pass different nameCsv value as well.

Map(function(x, y) create_distance_to_strand(annotation, x, y), 
    files, sprintf('proof%d.csv', seq_along(files)))

edited Sep 24 '21 at 11:46

answered Sep 24 '21 at 11:39

Ronak Shah

377,200
20
156
213

my bad to forgot the anonymous function. However, it only generate one csv, and not one per file. Any idea about how to solve it? Thanks! – lana Sep 24 '21 at 11:43
1

@lana because you are overwriting the same file - proof.csv. Use paste0, to create files like proof1.csv proof2.csv, etc – zx8754 Sep 24 '21 at 11:46
1

Yes, that is because we are passing the same filename "proof.csv" in every iteration. You may create a vector of filenames and pass it separately using `Map`. See the updated answer. – Ronak Shah Sep 24 '21 at 11:48

use apply file list for a function in R

1 Answers1