0

I want to perform coordinate descent iteration on a set of points whose latitudes and longitudes are given in arrays . The iteration should help me estimate nearest locations for the set of latitude and longitude points. I have an array 'a' and 'b' of lat/long values. Both the arrays denote the same set of locations.

       Longitude     Latitude
  1.    100.1130      17.5406
  2.     99.8961      20.0577
  3.     99.8829      20.0466
  4.    101.2457      16.8041
  5.    102.1314      19.8881
Ross
  • 39
  • 6
  • Perhaps some help [here](http://stackoverflow.com/questions/29154705/why-no-variable-selection-when-running-glmnet-on-diabetes-dataset-with-alpha-1) – MichaelChirico Aug 15 '15 at 18:54
  • I don't understand why this needs to be iterative. Won't the set of closest points to a given point always be the same, regardless of iteration? If one of the methods of the `dist` function works, you could quickly figure out the minimum distance for each point. – Tad Dallas Aug 15 '15 at 19:42

2 Answers2

2

You could use the dist() function for this:

mat2 <- dist(mat,method = "euclidean")
           1.         2.         3.         4.
2. 2.52642792                                 
3. 2.51654168 0.01724674                      
4. 1.35108902 3.52240445 3.51724752           
5. 3.09591583 2.24172484 2.25407952 3.20866335

The five closest points have the distances

> head(sort(mat2),5)
#[1] 0.01724674 1.35108902 2.24172484 2.25407952 2.51654168

The pairs of points in the distance matrix can be deduced quite easily from their index:

> head(order(mat2),5)
#[1] 5 3 7 9 2

The index is the entry counting columnwise starting from the upper left, so mat2[5]=0.01724674 is the distance between point 2 and point 3, mat2[3]=1.351089 is the distance between point 1 and point 4, etc.

We can define a function that extracts these pairs:

dist_pairs <- function(x,y){
  idx1 = ceiling(x / (y - 1))
  idx2 = x %% (y - 1) + idx1
  return(c(idx1, idx2))
} 

where the second argument is the number of rows in the original matrix; 5 in this case. As an example, the result of

> dist_pairs(9, nrow(mat))
#[1] 3 4

means that the entry number 9 in the distance matrix contains the distance between points 3 and 4.

Edit

By looking at the answer by @Jaap and re-reading the OP, I realized that you are interested in finding the point that is closest to each data point, and not necessarily in ranking those pairs in your set which have the smallest distance between each other.

To obtain this information, the code can be adapted in a similar way as suggested by @Jaap:

mat3 <- as.matrix(mat2)
diag(mat3) <- NA
mat <- as.data.frame(mat)
mat$closest <- apply(mat3,1,which.min)
> mat
#  Longitude Latitude closest
#1  100.1130  17.5406       4
#2   99.8961  20.0577       3
#3   99.8829  20.0466       2
#4  101.2457  16.8041       1
#5  102.1314  19.8881       2

data

mat <- as.matrix(read.table(text="     Longitude        Latitude
      100.1130      17.5406
       99.8961      20.0577
       99.8829      20.0466
      101.2457      16.8041
      102.1314      19.8881", header=T))
RHertel
  • 23,412
  • 5
  • 38
  • 64
  • 1
    I think for lat/lon you might want to use great-circle distance. There are functions in sp or other packages. – Rorschach Aug 15 '15 at 19:54
  • In general I perfectly agree. In this specific case the values of the coordinates seem to be sufficiently close that a "flat earth" approximation might be reasonably applicable. – RHertel Aug 15 '15 at 19:58
2

When you want to calculate the distance between points with latitude/longitude coordinates, the distm function from the geosphere package gives you several methods: distCosine, distHaversine, distVincentySphere & distVincentyEllipsoid. Of these, the distVincentyEllipsoid is considered the most accurate one. In these answers I showed how to calculate the distance between two different lists of points:

However, your case is a bit different as you want to compare within a list of points with coordinates. By slightly changing the method I showed in these answers, you can achieve the same. An illustration on how to do that with the data you provided:

The data:

points <- structure(list(p = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"),
                         lon = c(100.113, 99.8961, 99.8829, 101.2457, 102.1314), 
                         lat = c(17.5406, 20.0577, 20.0466, 16.8041, 19.8881)), 
                    .Names = c("p", "lon", "lat"), class = "data.frame", row.names = c(NA, -5L))

Note that I added a variable p with names for the points.

Original method:

First you create a distance matrix with:

distmat <- distm(points[,c('lon','lat')], points[,c('lon','lat')], fun=distVincentyEllipsoid)

which gives:

> distmat
         [,1]       [,2]       [,3]     [,4]     [,5]
[1,]      0.0 279553.803 278446.927 145482.3 335897.8
[2,] 279553.8      0.000   1848.474 387314.3 234708.0
[3,] 278446.9   1848.474      0.000 386690.7 235998.8
[4,] 145482.3 387314.334 386690.666      0.0 353951.5
[5,] 335897.8 234707.958 235998.784 353951.5      0.0

When you now assign the nearest point to each point with:

points$nearest <- points$p[apply(distmat, 1, which.min)]

each point will be assigned to itself as nearest point:

> points
  p      lon     lat nearest
1 A 100.1130 17.5406       A
2 B  99.8961 20.0577       B
3 C  99.8829 20.0466       C
4 D 101.2457 16.8041       D
5 E 102.1314 19.8881       E

Adaptation:

You can prevent that behavior by replacing the the 0 values in the distance matrix distmat with:

distmat[distmat==0] <- NA

When you now assign the nearest point to each point with:

points$nearest <- points$p[apply(distmat, 1, which.min)]

you get the correct values:

> points
  p      lon     lat nearest
1 A 100.1130 17.5406       D
2 B  99.8961 20.0577       C
3 C  99.8829 20.0466       B
4 D 101.2457 16.8041       A
5 E 102.1314 19.8881       B
Community
  • 1
  • 1
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • I guess I framed the question wrong. I do not want to find the nearest point within the same set of 5 points. I want to iterate each and every one of the 5 points 100 times and find the nearest point from those 100 iterations. For Example : I want to iterate point A, 100 times and find the nearest lat/long among those 100 iterations. Similarly i want to do that for all other points. In short 100 iterations for each and every point and finding the nearest point within those iterations, not within the dataset. – Ross Aug 16 '15 at 16:48
  • @Ross Could you specify what you mean by "itterate over point A 100 times"? You will have to give us more information on how you want to itterate. As your question is now writen it is not clear. – Jaap Aug 16 '15 at 18:38
  • At each iteration the program should update each location with L1-multivariate median of the locations to which it has non-zero weights. – Ross Aug 16 '15 at 19:02
  • @Ross Maybe you can show the code you used to get to that `wab` matrix. You can also include code you tried, even if it didn't work. – Jaap Aug 16 '15 at 19:09
  • I just generated the Wab matrix by a random sample. The values in the Wab matrix are not important. It can be any random number. – Ross Aug 16 '15 at 20:01