4

I used the dist function in R and I am wondering the time complexity of it.

I know that the hierarchical clustering has a N^2*logN time complexity. And hierarchical clustering is composed of two parts as below codes in R.

> d <- dist(as.matrix(mtcars))   # find distance matrix 
> hc <- hclust(d)                # apply hirarchical clustering 
> plot(hc)                       # plot the dendrogram

before applying hierarchical clustering, calculating the distance matrix is required. I think this takes N^2 complexity?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
sclee1
  • 1,095
  • 1
  • 15
  • 36

1 Answers1

4

Precisely, if matrix X has N rows P columns, the complexity of dist(X) is 3N(N-1)P/2. This is computed by N(N - 1)/2 * 3P.

Explanation:

  • There are N(N - 1)/2 entries in the resulting distance matrix;
  • Each entry is a dot product between two length P vectors (plus a square root), each involving P subtraction, P multiplication and P addition.
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248