0

I have the following code (nested for loop) in R which is extremely slow. The loop matches values from two columns. Then picks up a corresponding file and iterates through the file to find a match. Then it picks up that row from the file. The iterations could go up to more than 100,000. Please if some one can provide an insight on how to quicken the process.

for(i in 1: length(Jaspar_ids_in_Network)) {
  m <- Jaspar_ids_in_Network[i]
  gene_ids <- as.character(GeneTFS$GeneIds[i])
  gene_names <- as.character(GeneTFS$Genes[i])

  print("i")
  print(i)

  for(j in 1: length(Jaspar_ids_in_Exp)) {
    l <- Jaspar_ids_in_Exp[j]
    print("j")
    print(j)

    if (m == l) {
      check <- as.matrix(read.csv(file=paste0(dirpath,listoffiles[j]),sep=",",header=FALSE))
      data_check <- data.frame(check)
      for(k in 1: nrow(data_check)) {
        gene_ids_JF <- as.character(data_check[k,3])
        genenames_JF <- as.character(data_check[k,4])

        if(gene_ids_JF == gene_ids) {
          GeneTFS$Source[i] <- as.character(data_check[k,3])
          data1 <- rbind(data1, cbind(as.character(data_check[k,3]),  
                                      as.character(data_check[k,8]), 
                                      as.character(data_check[k,9]),  
                                      as.character(data_check[k,6]), 
                                      as.character(data_check[k,7]),  
                                      as.character(data_check[k,5])))
        } else if (toupper(genenames_JF) == toupper(gene_names)) { 
          GeneTFS$Source[i] <- as.character(data_check[k,4])
          data1 <- rbind(data1, cbind(as.character(data_check[k,4]),
                                      as.character(data_check[k,5]), 
                                      as.character(data_check[k,6]), 
                                      as.character(data_check[k,7]),
                                      as.character(data_check[k,8]),
                                      as.character(data_check[k,2])))
        } else {
         # GeneTFS[i,4] <- "No Evidence"    
        }
      }
    } else {
      # GeneTFS[i,4] <- "Record Not Found"          
    }
  }  
}
Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64
user2498657
  • 379
  • 2
  • 6
  • 16
  • First of all, read in all files and put them in a list (and possibly `rbind` them in one big data.frame). Then you probably can use `merge` or use the data.table package and its joins. The bottom line is that you shouldn't use any `for` loops here. But if you do, you should definitely not grow an object in it. Can't tell you more since your example is not [reproducible](http://stackoverflow.com/a/5963610/1412059). – Roland Jan 08 '14 at 21:37
  • Comment of my professor about R: Never ever use loops! – Verena Haunschmid Jan 08 '14 at 21:55
  • Can you suggest on how to replace them. I don't have much knowledge in R – user2498657 Jan 08 '14 at 22:22
  • @ExpectoPatronum That position is too extreme. `for` loops have there place and can be very useful (if done right). – Roland Jan 09 '14 at 08:26

1 Answers1

0

If you pull out the logic for processing one pair into a function, f(m,l), then you could replace the double loop with:

outer(Jaspar_ids_in_Network, Jaspar_ids_in_Exp, Vectorize(f))
Neal Fultz
  • 9,282
  • 1
  • 39
  • 60