2

Assume i have the following dataset:

dt<-data.frame(X=sample(4),Y=sample(4))

when i run dist on it with

dist(dt, method = "euclidean")

it gives me a matrix like:

         1        2        3
2 3.162278                  
3 2.236068 3.605551         
4 2.236068 2.236068 1.414214

I need to know, what are the N'th minimum pair

for example the first minimum pair are: (1,3) with distance 1.414214 the second pairs are (2,4), and (1,4) with 2.236068 and so on .... So, how can i have this function?

Jeff
  • 7,767
  • 28
  • 85
  • 138
  • 1
    i don't know, how to use `set.seed()`, but i will read it, and update my question – Jeff Apr 26 '17 at 14:57

1 Answers1

2

You want to split row / col index by entry:

n <- nrow(dt) - 1
j <- rep.int(1:n, n:1)    # column number
i <- j + sequence(n:1)    # row number
x <- dist(dt)
loc <- data.frame(i, j)
pair <- split(loc, x)

Sometimes it is a good idea to enforce factor levels:

lev <- sort(unique(x))
pair <- split(loc, factor(x, lev))

Misc

My solution above is exhaust, in that even if you want indices for the minimum, it will return a full list. You can do extraction, for example, by pair[3] to get result for the 3rd minimum.

While this is interesting in its own right, it is inefficient if you always want the result for one entry, and discarding the rest. My answer to this question helps you: R - How to get row & column subscripts of matched elements from a distance matrix, where you also learn the basics for lower triangular matrix.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248