loop through 2 dataframes

Question

I am new to R and trying to loop through each row of df1 and search for rows in df2 that are close in distance (5mi/8046.72m). I think df1 is looping as intended but I don't think it is going through all of df2.

{for (i in 1:1452){

p1 <- df1[i, 4:5]
p2 <- df2[1:11, 2:3]

d <- distCosine(p1, p2, r=6378137)

return(d< 8046.72)
i <- i+1}
}

I get the output:

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Hi, welcome to SO. Please consider reading up on [ask] and how to produce a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It makes it easier for others to help you. — Heroka, Feb 23 '16 at 15:56
In your current code, there are some weird things going on. Where are you returning to? And why are you manually increasing i? — Heroka, Feb 23 '16 at 15:57

score 1 · Answer 1 · answered Feb 23 '16 at 16:08

I would just use an apply function. First, let's make your problem reproducible by creating some "fake" data - I am making the lon/lat pairs artificially close so that we can get a few TRUE's back in the results:

library(geosphere)

df1 <- data.frame(X1 = sample(letters, 100, replace = T),
                  x2 = sample(letters, 100, replace = T),
                  x3 = sample(letters, 100, replace = T),
                  lon = sample(10:12 + rnorm(100, 0, 0.1), 100, replace = T),
                  lat = sample(10:12 + rnorm(100, 0, 0.1), replace = T))

df2 <- data.frame(x1 = sample(letters, 100, replace = T),
                  lon = sample(10:12 + rnorm(100, 0, 0.1), 100, replace = T),
                  lat = sample(10:12 + rnorm(100, 0, 0.1), 100, replace = T))

We can then create two matrices containing the values of interest:

m1 <- as.matrix(df1[, c("lon", "lat")])
m2 <- as.matrix(df2[1:11, c("lon", "lat")])

Now we can use the apply function across the rows of m2 which return a 100 X 11 matrix:

results <- apply(m2, 1, FUN = function(x) distCosine(x, m1))

To get the less than 5 mi (~8046.72m), results, we simply subset:

results[results < 8046.72]

# Showing the next two for alternative output
which(results < 8046.72)
which(results < 8046.72, arr.ind = T)

Note: In your question, it looks like you are interested in the first 1,452 rows -- this would mean the results would we be a 1,452 X 11 matrix.

thanks, I added row names to each df, is there a way to keep these to identify which rows match based on their id? — KerryLee, Feb 23 '16 at 18:21

loop through 2 dataframes

1 Answers1