0

I have this loop, which I have to apply to a very large dataset. But it is very slow. Can someone suggest some ways to speed it up?

###  calculate distance ###
# matrices for results
distsum <- matrix(ncol=1, nrow=nrow(data))
grp1sum <- matrix(ncol=1, nrow=nrow(data))
# loop over each observation 
for (i in 1:nrow(data)) { 
  dist_count <- 0
  grp1_count <- 0
  if (data[i,21]==101 | data[i,21]==155 | data[i,21]==147 | data[i,21]==185) {
    limit <- 5
  } else {
    limit <- 15
  }
  # reset counters
  dist_count <- 0
  grp1_count <- 0
  # loop over all ohter obs
  for (j in 1:nrow(data)) {
    cond_A <- data[i,37]>data[j,38] # check of daterintervals overlap. e.i if the two firms is present on the samw time
    cond_B <- data[i,38]<data[j,27]
    if (data[i,21]==data[j,21] & !(cond_A | cond_B) & data[i,2]==data[j,2]){
      distance <- gcd.hf(data[i,36],data[i,35],data[j,36],data[j,35]) # measure the distance
      if (distance <= limit & distance!=0) {
         dist_count <- dist_count + 1 #count number of physician in limit
         grp1_count <- grp1_count + data[j,5]  #sum number of patients registret
       }
   }
  } 
  distsum[i,] <- dist_count
  grp1sum[i,] <- grp1_count
}
Rud Faden
  • 343
  • 2
  • 17
  • 2
    Please make your code [reproducible](http://stackoverflow.com/a/5963610/1412059). – Roland Nov 18 '13 at 08:36
  • 3
    Maybe send this over to [code review?](http://codereview.stackexchange.com/). – Simon O'Hanlon Nov 18 '13 at 08:55
  • For starters: your first loop `i in 1:nrow(data)` doesn't do anything useful, since you overwrite `limit` every time. What do you actually expect to get out of that loop? Then, instead of creating `cond_A` and `cond_B` and then evaluating `!(condA|condB)`, just evaluate the conditionals directly. However, that's separate from the main unknown: what do you consider "too slow" to be, and how big is your dataset? Certainly your `j`-loop could be turned into a multicore process, but we can't tell whether that's really necessary. – Carl Witthoft Nov 18 '13 at 12:33

0 Answers0