Apply a function for every row in a dataframe for every row in another dataframe

Question

I have two dataframes, "a" and "b". They both have gps data, but "a" has 1000 rows and "b" has 5 rows. I am comparing distances with the haversine formula, but I want to apply the function so that each row of "a" is compared to every row of "b". I should end up with 5000 results.

This is what I have so far, but it only gives me 1000 results:

library(geosphere)

for(i in 1:nrow(a)){
  distHaversine(a[,c(11,9)],b[,c(4,2)])
}

Thanks in advance for any assistance.

EDIT

I found a much better solution to my problem that cuts down on both code and computing time:

library(geosphere)

result <- distm(a[ , c(11, 9)], b[ , c(4, 2)], fun = distHaversine)

Are you able to share your dataset? Try `dput(a)` and `dput(b)`. — Andrew Brēza, Jul 13 '17 at 12:41
Just go [through this link](http://stackoverflow.com/questions/5963269) — Sotos, Jul 13 '17 at 12:42
The information in the dataset is sensitive, but I figured it wasn't important anyway. The distHaversine() function works fine, I just need to know how to loop it so it applies each line of "a" to each line of "b" — Sam, Jul 13 '17 at 12:48
Did you maybe just forgot the `i` in your for loop? `a[i,c(11,9)], b[i,c(4,2)]` — Jakob Gepp, Jul 13 '17 at 13:12
You're right, Jakob Gepp, I did (foolishly) forget that part of it. — Sam, Jul 13 '17 at 15:51

score 2 · Accepted Answer · answered Jul 13 '17 at 12:47

2

Maybe something like the following.

result <- matrix(numeric(nrow(a)*nrow(b)), ncol = nrow(b))

for(i in seq_len(nrow(a))){
    for(j in seq_len(nrow(b))){
        result[i, j] <- distHaversine(a[i, c(11, 9)],b[j, c(4, 2)])
    }
}

result

answered Jul 13 '17 at 12:47

Rui Barradas

70,273
8
34
66

This works great, I get a matrix with a row for every row "a", and a column for every row "b", with distances calculated between each one. – Sam Jul 13 '17 at 12:57

score 0 · Answer 2 · answered Jul 13 '17 at 12:55

This could be a solution for you:

indx <- expand.grid(a=1:1000,b=1:5)

res <- apply(indx,1,function(x) distHaversine(a[x[1],],b[x[2],]))

With expand.grid I combine all row indices of both data.frames and then use them for indexing inside an apply function.

To trace back which distance you calculated, you can add the result as a column to the indices.

> head(cbind(indx,res))
  a b      res
1 1 1 12318145
2 2 1  5528108
3 3 1 11090739
4 4 1 14962267
5 5 1 19480911
6 6 1  8936878

Apply a function for every row in a dataframe for every row in another dataframe

2 Answers2