N'th minimum pair from dist function

Question

Assume i have the following dataset:

dt<-data.frame(X=sample(4),Y=sample(4))

when i run dist on it with

dist(dt, method = "euclidean")

it gives me a matrix like:

         1        2        3
2 3.162278                  
3 2.236068 3.605551         
4 2.236068 2.236068 1.414214

I need to know, what are the N'th minimum pair

for example the first minimum pair are: (1,3) with distance 1.414214 the second pairs are (2,4), and (1,4) with 2.236068 and so on .... So, how can i have this function?

i don't know, how to use `set.seed()`, but i will read it, and update my question — Jeff, Apr 26 '17 at 14:57

score 2 · Accepted Answer · edited May 23 '17 at 12:17

You want to split row / col index by entry:

n <- nrow(dt) - 1
j <- rep.int(1:n, n:1)    # column number
i <- j + sequence(n:1)    # row number
x <- dist(dt)
loc <- data.frame(i, j)
pair <- split(loc, x)

Sometimes it is a good idea to enforce factor levels:

lev <- sort(unique(x))
pair <- split(loc, factor(x, lev))

Misc

My solution above is exhaust, in that even if you want indices for the minimum, it will return a full list. You can do extraction, for example, by pair[3] to get result for the 3rd minimum.

While this is interesting in its own right, it is inefficient if you always want the result for one entry, and discarding the rest. My answer to this question helps you: R - How to get row & column subscripts of matched elements from a distance matrix, where you also learn the basics for lower triangular matrix.

N'th minimum pair from dist function

1 Answers1