0

I am hoping to build a matrix of 'fake' coordinates (i.e., they could be anywhere on Earth) based on their distances (in km or m) from one another.

I have a data frame containing distances between locations.

dist_df<- data.frame(site1=c("a","b","c","d"),
                     site2=c("b","c","d","a"),
                     distance = c(222.1, 672.4, 45.2, 65.4))

However, the actual coordinates have obstructions between them. Thus, I have circumvented the obstructions with another bit of code to obtain a least-cost distance. To run a series of later functions I require lat/long. I figured the easiest way to do this was to generate 'fake' coordinates based on their relative least-cost distances.

I am hoping to obtain something like the following:

coordinates_sites<-data.frame(site=c("a","b","c","d"), 
                              lat=c(34.5332,32.1232,30.4232,30.4232),
                              long=c(-120.2222,-125.4422,-123.3512,-122.4232))

Any thoughts on the best way to do this? Many thanks in advance!

Phil
  • 7,287
  • 3
  • 36
  • 66
Griff
  • 27
  • 5
  • Seems related to [this](https://stackoverflow.com/q/48663440/5325862) and [this](https://stackoverflow.com/q/64391369/5325862). Do you have any wiggle room in your distances, or they have to be exactly those values? – camille Mar 15 '22 at 16:43

1 Answers1

1

Here is a solution using cmdscale. As cmdscale needs a complete distance matrix, I had to fill in the missing distances in the dataframe you provided:

dist_df<- data.frame(site1=c("a","b","c","d", "d", "c"),
                     site2=c("b","c","d","a", "b", "a"),
                     distance = c(222.1, 672.4, 45.2, 65.4, 10.3, 110.1))

dst <- dist_df  %>%
  {
    names <- union(.$site1, .$site2)
    m <- matrix(0,
                nrow = length(names),
                ncol = length(names),
                dimnames = list(names, names))
    m[cbind(.$site1, .$site2)] <- pull(., distance)
    m[cbind(.$site2, .$site1)] <- pull(., distance)
    diag(m) <- 0
    as.dist(m)
  } 

dst

##>       a     b     c
##> b 222.1            
##> c 110.1 672.4      
##> d  65.4  10.3  45.2

The fake coordinates can be obtained by the following:

cmdscale(dst)

##>        [,1]        [,2]
##>a  -19.88532  39.4181776
##>b  341.11969  -0.9109572
##>c -331.30946  -4.3427637
##>d   10.07509 -34.1644567

Note that the solution returned by cmdscale is an optimization and does not preserve faithfully input distances:

dist(cmdscale(dst))

##>           a         b         c
##> b 363.25068                    
##> c 314.48372 672.43790          
##> d  79.44829 332.71057 342.68461

However, if you start from a matrix of actual distances, cmdscale can reconstruct accurately the coordinates. It is shown in the cmdscale example:

loc <- cmdscale(eurodist)
x <- loc[, 1]
y <- -loc[, 2] # reflect so North is at the top
## note asp = 1, to ensure Euclidean distances are represented correctly
plot(x, y, type = "n", xlab = "", ylab = "", asp = 1, axes = FALSE,
     main = "cmdscale(eurodist)")
text(x, y, rownames(loc), cex = 0.6)

enter image description here

Stefano Barbi
  • 2,978
  • 1
  • 12
  • 11
  • This is fairly helpful. The trouble is that I would like to end up with a coordinate system that reflects actual locations on earth (sorry should have been more clear about that), which is necessary for a downstream analysis I am performing. Trying to think of a way to do this...perhaps by assigning one point a "real" coordinate and computing the remaining coordinates based on their relative distances from that one? – Griff Jun 29 '22 at 00:53
  • I think you need to provide at least two points and then find the optimal linear transformation of computed coordinates such that distance from real coordinates is minimum. – Stefano Barbi Jun 29 '22 at 09:20