1

I'm conjecturing that with Complete-linkage clustering two elements from the same cluster will always be closer to each other then some other element from another cluster.

In more formal terms:

Let $C$ be a clustering. $\not\exists z \in C_j$ s.t. $\bigtriangleup(x, z) < \bigtriangleup(x, y)$ where $x,y \in C_i$, $C_i \neq C_j$ and $C_i, C_j \in C$.

I haven't been able to prove the conjecture yet, thus I'm wondering whether I'm right or wrong. If this is indeed the case, I would much appreciate a sketch of a proof. I'm pretty sure I can work my way from there.

On a side note (not that I think it makes a difference), I'll be applying the clustering algorithm on a one-dimensional dataset.

Your input is much appreciated.

AndyG
  • 39,700
  • 8
  • 109
  • 143
alexT
  • 49
  • 5
  • 1
    I really wish StackOverflow supported LaTeX embedding. – AndyG Apr 16 '15 at 12:23
  • Is this an accurate statement of what you are trying to prove? "There will never exist an element in another cluster (Cluster B) that is nearer to some element in the cluster under consideration (Cluster A) than any other element in the cluster under consideration (Cluster A). That is, the minimum distance between an element in Cluster A to all other elements in Cluster A will always be less than the minimum distance from an element in Cluster A to any element in Cluster B" – AndyG Apr 16 '15 at 12:45
  • I'm pretty sure **your conjecture is incorrect**. In particular, it would probably imply that single-link and complete-link always yield the same result, which they don't. – Has QUIT--Anony-Mousse Apr 16 '15 at 13:51

1 Answers1

1

I'm not sure your conjecture is true. Because of the nature of complete-linkage clustering, each time you agglomerate two clusters, you are doing so because the two elements that are farthest apart between those two clusters, are still mutually closer to each other than the farthest elements to any other cluster.

What you're trying to prove is that

"There will never exist an element in another cluster (Cluster B) that is nearer to some element in the cluster under consideration (Cluster A) than any other element in the cluster under consideration (Cluster A). That is, the minimum distance between an element in Cluster A to all other elements in Cluster A will always be less than the minimum distance from an element in Cluster A to any element in Cluster B"

However, after merging two clusters A and B due to complete-linkage clustering, there could still exist an element in cluster C that is nearer to an element in Cluster AB than any other element in cluster AB because complete-linkage is only concerned about maximal distances.

Counter-example:

A--1--B--3--C--2.5--D--2--E

How to interpret the example:

  • The observations, A, B, C, D, and E are arranged in a straight line.
  • Observation A is 1 unit away from observation B
  • Observation B is 3 units away from observation C
  • Observation C is 2.5 units away from observation D
  • Observation D is 2 units away from observation E

Let's perform hierarchical clustering:

  1. First A and B are merged because distance is 1:

New picture:

AB--4--C--2.5--D--2--E

  • Cluster AB is 4 units away from observation C (because A is 4 units from C due to complete-linkage clustering), which is 2.5 units from D, which is 2 units from E

    1. Next, D and E are merged because distance is 2

New picture

AB--4--C--4.5--DE

  • Cluster AB is 4 units from observation C (as before), which is 4.5 units from Cluster DE because C is 4.5 units from E due to complete-linkage clustering.

    1. Next, C would need to be merged into AB because its distance is 4 whereas it's 4.5 to DE:

ABC--8.5--DE

  • Cluster ABC is 8.5 units from DE because A is 8.5 units from E.

BUT, at this point we've disproven your conjecture. Element C is 3 units from B and 4 units from A (refer to original diagram). However, Element C is only 2.5 units from element D, which is inside another cluster.

AndyG
  • 39,700
  • 8
  • 109
  • 143