Making a nested for loop run faster in R

Question

I have the following code (nested for loop) in R which is extremely slow. The loop matches values from two columns. Then picks up a corresponding file and iterates through the file to find a match. Then it picks up that row from the file. The iterations could go up to more than 100,000. Please if some one can provide an insight on how to quicken the process.

for(i in 1: length(Jaspar_ids_in_Network)) {
  m <- Jaspar_ids_in_Network[i]
  gene_ids <- as.character(GeneTFS$GeneIds[i])
  gene_names <- as.character(GeneTFS$Genes[i])

  print("i")
  print(i)

  for(j in 1: length(Jaspar_ids_in_Exp)) {
    l <- Jaspar_ids_in_Exp[j]
    print("j")
    print(j)

    if (m == l) {
      check <- as.matrix(read.csv(file=paste0(dirpath,listoffiles[j]),sep=",",header=FALSE))
      data_check <- data.frame(check)
      for(k in 1: nrow(data_check)) {
        gene_ids_JF <- as.character(data_check[k,3])
        genenames_JF <- as.character(data_check[k,4])

        if(gene_ids_JF == gene_ids) {
          GeneTFS$Source[i] <- as.character(data_check[k,3])
          data1 <- rbind(data1, cbind(as.character(data_check[k,3]),  
                                      as.character(data_check[k,8]), 
                                      as.character(data_check[k,9]),  
                                      as.character(data_check[k,6]), 
                                      as.character(data_check[k,7]),  
                                      as.character(data_check[k,5])))
        } else if (toupper(genenames_JF) == toupper(gene_names)) { 
          GeneTFS$Source[i] <- as.character(data_check[k,4])
          data1 <- rbind(data1, cbind(as.character(data_check[k,4]),
                                      as.character(data_check[k,5]), 
                                      as.character(data_check[k,6]), 
                                      as.character(data_check[k,7]),
                                      as.character(data_check[k,8]),
                                      as.character(data_check[k,2])))
        } else {
         # GeneTFS[i,4] <- "No Evidence"    
        }
      }
    } else {
      # GeneTFS[i,4] <- "Record Not Found"          
    }
  }  
}

First of all, read in all files and put them in a list (and possibly `rbind` them in one big data.frame). Then you probably can use `merge` or use the data.table package and its joins. The bottom line is that you shouldn't use any `for` loops here. But if you do, you should definitely not grow an object in it. Can't tell you more since your example is not [reproducible](http://stackoverflow.com/a/5963610/1412059). — Roland, Jan 08 '14 at 21:37
Can you suggest on how to replace them. I don't have much knowledge in R — user2498657, Jan 08 '14 at 22:22
@ExpectoPatronum That position is too extreme. `for` loops have there place and can be very useful (if done right). — Roland, Jan 09 '14 at 08:26

score 0 · Answer 1 · answered Jan 08 '14 at 21:40

0

If you pull out the logic for processing one pair into a function, f(m,l), then you could replace the double loop with:

outer(Jaspar_ids_in_Network, Jaspar_ids_in_Exp, Vectorize(f))

answered Jan 08 '14 at 21:40

Neal Fultz

9,282
1
39
60

Making a nested for loop run faster in R

1 Answers1