I know the function sqrt((x1-x0)^2 + (y1-y0)^2)
to find the distance. But I have two columns of Latitude and Longitude and I want to find the variance between the 4 closest counties.
Do I need a loop?
I have one more column with percentages. Every county has a percentage. So, I need to find the Variances of percentages between the closest counties.
Asked
Active
Viewed 254 times
2

bill89
- 29
- 5
-
1Welcome to StackOverflow ! Please provide a [MCVE] – Steven Beaupré May 09 '16 at 16:09
-
What do you mean by "variance" here? Statistical variance? If so, of what quantity of the counties? – alistaire May 09 '16 at 16:11
-
You're using the Euclidean distance for a plane, but the distance on a sphere is different. The `geosphere` package has functions for this (see [this SO answer](http://stackoverflow.com/a/32364246/496488), for example). You can also roll your own function: See [here](http://www.r-bloggers.com/great-circle-distance-calculations-in-r/) and [here](http://stackoverflow.com/questions/29585759/calculating-distances-from-latitude-and-longitude-coordinates-in-r). – eipi10 May 09 '16 at 16:14
-
I am using 2D because I am going to do mapping after that. – bill89 May 09 '16 at 16:18
1 Answers
3
Of course you don't need to loop the problem.
Instead of doing this, you'd be better by creating an earth.dist function as follows:
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}
By combining this and the calculation between all the countries, you could order the list for each country by ascending distance, and afterwards computing the variance. It would follows the result you are looking for.
Hope this functional explanation helps you.
-
I am using only US country. US country has states and a state has many counties. – bill89 May 09 '16 at 16:38
-
In this case, I guess you can use counties instead. My best try without data is following next steps: 1. For each county compute the distances to all of them by using eath_dist + lapply. 2. Fetch first 4 rows only for each one. 3. Compute de variance of distance for each one afterwards. – Francisco Calvo Pérez May 10 '16 at 14:05