-3

so basically I have two separate groups within a data frame: I have a names of locations and the associated latitude and longitude ranges, and then I have a list of exact lat and long values. What I want to do is for each EXACT lat/long pair, iterate through the lat/long RANGE to see if the pair falls within the range. If it does, I want to write the name of the location next to the lat/long pair.

So, my code DOES work. However, it takes about 40 minutes to run though. I don't know if this is normal or if I'm doing this in a terribly structured/inefficient way. Any thoughts and input would be creatively appreciated!

for (x in 1:35274) {
  lat = test[x,9]
  long = test[x,10]
  for (y in 1:1198) {
    if ((((test[y,3] <= lat) & (lat <= test[y,2])) &
         ((test[y,4] <= long) & (test[y,5] >= long))) == TRUE)
      test[x,12] <- test[y,1]
  }
}
Jaap
  • 81,064
  • 34
  • 182
  • 193
Jennifer
  • 23
  • 3
  • 1
    Read http://www.burns-stat.com/pages/Tutor/R_inferno.pdf You should be able to get rid of most loops. – RockScience Jan 28 '16 at 03:38
  • Can you edit with the results of `dput` on a representative sample data set? And yes, these loops are unnecessary. Vectorize; it's the R way. – alistaire Jan 28 '16 at 03:42
  • looks like you dont even need to loop over y, just do all the y rows at once – rawr Jan 28 '16 at 04:14
  • 3
    I suspect that findINterval would be very useful but for this sort of question I don't build example datasts ... that is _your_ responsibility! – IRTFM Jan 28 '16 at 04:35
  • 2
    Please provide a _minimal, self contained_ example: see [**here**](http://stackoverflow.com/help/mcve) and [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – Henrik Jan 28 '16 at 07:16

1 Answers1

0

Basically, all you are doing here is selecting the first 35274 entries of test[,9] and test[,10], but you are doing it one value at a time. It's better to just do this all in one go:

n <- 1:35274
lat <- test[n,9]
long <- test[n,10]

Then, within your 35274 loops you do another 1198 loops, which is a lot of operations to run (42,258,252 loops!). This is probably where you're having some problems. This also seems like it is not achieving what you want. Basically, for any given x, if multiple y satisfy the conditions, only the last y will be written down and any y before that will be overwritten. In other words, if there are multiple locations that fall within the range of your lat/long, then only one will be written down.

Assuming that you only expect one location per x, you can reduce this to 1198 loops by simply doing this:

m <- 1198
vec <- rep(NA,m)
for (y in 1:m){
  ind <- (test[y,3] <= lat) & (test[y,2] >= lat) & (test[y,4] <= long) & (test[,5] >= long)
  vec[y] <- test[which(ind),1]
}

If multiple locations are possible, simply use lis <- list() to initialize and then store with lis[y] <- test[which(ind),1].

There are probably better ways to do this as well, but this should already save you a lot of time.

slamballais
  • 3,161
  • 3
  • 18
  • 29