0

I've created this function to get some information from two different files

   create_distance_to_strand <- function(data, data2, nameCsv) {
      newData  <- data  %>% 
        select(seqnames, start, end, V4, seqnames, geneStart, geneEnd, geneStrand, distanceToTSS, SYMBOL, geneLength)%>%
        rename(peak = V4)
      
      joined_df <- merge(newData , closest_dist, by.x = "peak", 
                         by.y = "peak", all.x = TRUE, all.y = TRUE) %>% drop_na()
      
      write.table(joined_df , nameCsv, sep = "\t")
    
      
      return(joined_df)
    }
    
    closest_dist = read.csv("closest_distance", header = T, sep ="\t")
    annotation = read.csv("results_annotation", header = T, sep ="\t") 
    myDistanceToStrand <- create_distance_to_strand(annotation,  closest_dist, "frame.csv")

It works as expected. However, I'm trying to make it more efficient in case I'd have different "closest_dist" files, in order to apply the function to all the files. I've tried this:

    files <- list.files(pattern = "closest*")
    proof = lapply(files, create_distance_to_strand(annotation,  closest_dist, "proof.csv"))

But does not work

Error in h(simpleError(msg, call)) : 
      error in evaluating the argument 'y' in selecting a method for function 'merge': object 'closest_dist' not found 

Any advice? Thank you

lana
  • 131
  • 7

1 Answers1

1

Since you have different closest_dist files which are saved in files, you can use lapply to pass them one by one. We can use an anonymous function here -

files <- list.files(pattern = "closest*")
proof = lapply(files, function(x) create_distance_to_strand(annotation,  x, "proof.csv"))

To have a separate output file you may pass different nameCsv value as well.

Map(function(x, y) create_distance_to_strand(annotation, x, y), 
    files, sprintf('proof%d.csv', seq_along(files)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • my bad to forgot the anonymous function. However, it only generate one csv, and not one per file. Any idea about how to solve it? Thanks! – lana Sep 24 '21 at 11:43
  • 1
    @lana because you are overwriting the same file - proof.csv. Use paste0, to create files like proof1.csv proof2.csv, etc – zx8754 Sep 24 '21 at 11:46
  • 1
    Yes, that is because we are passing the same filename "proof.csv" in every iteration. You may create a vector of filenames and pass it separately using `Map`. See the updated answer. – Ronak Shah Sep 24 '21 at 11:48