0

I have two dataframes, "a" and "b". They both have gps data, but "a" has 1000 rows and "b" has 5 rows. I am comparing distances with the haversine formula, but I want to apply the function so that each row of "a" is compared to every row of "b". I should end up with 5000 results.

This is what I have so far, but it only gives me 1000 results:

library(geosphere)

for(i in 1:nrow(a)){
  distHaversine(a[,c(11,9)],b[,c(4,2)])
}

Thanks in advance for any assistance.

EDIT

I found a much better solution to my problem that cuts down on both code and computing time:

library(geosphere)

result <- distm(a[ , c(11, 9)], b[ , c(4, 2)], fun = distHaversine)
www
  • 38,575
  • 12
  • 48
  • 84
Sam
  • 59
  • 1
  • 8
  • 1
    Are you able to share your dataset? Try `dput(a)` and `dput(b)`. – Andrew Brēza Jul 13 '17 at 12:41
  • Just go [through this link](http://stackoverflow.com/questions/5963269) – Sotos Jul 13 '17 at 12:42
  • The information in the dataset is sensitive, but I figured it wasn't important anyway. The distHaversine() function works fine, I just need to know how to loop it so it applies each line of "a" to each line of "b" – Sam Jul 13 '17 at 12:48
  • Did you maybe just forgot the `i` in your for loop? `a[i,c(11,9)], b[i,c(4,2)]` – Jakob Gepp Jul 13 '17 at 13:12
  • You're right, Jakob Gepp, I did (foolishly) forget that part of it. – Sam Jul 13 '17 at 15:51

2 Answers2

2

Maybe something like the following.

result <- matrix(numeric(nrow(a)*nrow(b)), ncol = nrow(b))

for(i in seq_len(nrow(a))){
    for(j in seq_len(nrow(b))){
        result[i, j] <- distHaversine(a[i, c(11, 9)],b[j, c(4, 2)])
    }
}

result
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • This works great, I get a matrix with a row for every row "a", and a column for every row "b", with distances calculated between each one. – Sam Jul 13 '17 at 12:57
0

This could be a solution for you:

indx <- expand.grid(a=1:1000,b=1:5)

res <- apply(indx,1,function(x) distHaversine(a[x[1],],b[x[2],]))

With expand.grid I combine all row indices of both data.frames and then use them for indexing inside an apply function.

To trace back which distance you calculated, you can add the result as a column to the indices.

> head(cbind(indx,res))
  a b      res
1 1 1 12318145
2 2 1  5528108
3 3 1 11090739
4 4 1 14962267
5 5 1 19480911
6 6 1  8936878
Val
  • 6,585
  • 5
  • 22
  • 52