I wonder if Triangle inequality is necessary for the distance measure used in kmeans.
3 Answers
k-means is designed for Euclidean distance, which happens to satisfy triangle inequality.
Using other distance functions is risky, as it may stop converging. The reason however is not the triangle inequality, but the mean might not minimize the distance function. (The arithmetic mean minimizes the sum-of-squares, not arbitrary distances!)
There are faster methods for k-means that exploit the triangle inequality to avoid recomputations. But if you stick to classic MacQueen or Lloyd k-means, then you do not need the triangle inequality.
Just be careful with using other distance functions to not run into an infinite loop. You need to prove that the mean minimizes your distances to the cluster centers. If you cannot prove this, it may fail to converge, as the objective function no longer decreases monotonically! So you really should try to prove convergence for your distance function!

- 76,138
- 12
- 138
- 194
-
My goal is to create clusters that have minimum number of 1 bits in all of their members (I need storage space for each 1 bit). I defined the center as Or() of all members and used the |Or(x,y)| as distance function. Currently I use linkage algorithm to create hierarchical clusters instead of using kmeans and that works great – Masood_mj Jul 18 '12 at 06:19
-
@Anony-Mousse: Do you have a reference for the requirement that the mean must be a minimum variance estimator? I have read a fair share of ML textbooks (e.g. Bishop 2007, Alpaydin 2009), but I have never seen a requirement like that. – stackoverflowuser2010 Sep 14 '15 at 17:34
-
@stackoverflowuser2010 The mean *is* the least-squares estimator of location, as proven by Gauss around 1800. That is not a requirement, but a fact. The need to *use* a consistent criterion in *both* steps arises from the **convergence proof**. But did any of these textbooks discuss convergence? (I've improved the wording above, to make it easier to understand.) – Has QUIT--Anony-Mousse Sep 14 '15 at 17:39
-
Unfortunately, ML textbooks tend to be really shallow on non-supervised methods. – Has QUIT--Anony-Mousse Sep 14 '15 at 17:44
Well, classical kmeans is defined on Euclidean space with L2 distance, so you get triangle inequality automatically from that (triangle inequality is part of how a distance/metric is defined). If you are using a non-euclidean metric, you would need to define what is the meaning of the "mean", amongst other things.
If you don't have triangle inequality, it means that two points could be very far from each other, but both can be close to a third point. You need to think how you would like to interpret this case.
Having said all that, I have in the past used average linkage hierarchical clustering with a distance measure that did not fulfill triangle inequality amongst other things, and it worked great for my needs.

- 7,577
- 6
- 33
- 50
-
Thanks. I am working on binary data and define the mean as Or() of bits of points in a cluster. I want to use d(A,B)=|Xor(A,B)|/|And(A,B)| which shows the cost of adding a point to a cluster over the benefit. However it does not satisfy the property. I first considered Jaccord distance but its meaning is different. – Masood_mj Jul 16 '12 at 18:31
-
I am not sure what your metric is trying to achieve, but kmeans is really defined for L2 (euclidean) distance - other methods like UPGMA allow different metrics more naturally. Regarding a metric, it really depends what your goal is, but how about Hamming distance? It fulfills the triangle inequality. – Bitwise Jul 16 '12 at 20:47
as so as Dimensions is max number of Lineary independent vectors in a vector space & in Linear Algebra there're 4 main subspaces (Row space, Column space, Null space & Left Null space) - you can calculate distance in tabular data either with Manhattan distance (no triangulation needed) or with Euclidean distance (triangulation needed). but anyway in Cartesian coordinate system (aka 2d [rows-columns]) taking into consideration cos() among vectors from points (when using triangulation - & K-means usually do).
As so as cos(90˚C)=0 (ref "the closer to 0 - the closer the two vectors are to being orthogonal") - thus, yes, triangular inequality is needed. If your points lie in one axis similary [cos=0] (no angle among them) then distance is pure distance among them in one space & 0 in another space due to cos(90˚C)=0. So to do projection matrix of features, you'd better to consider triangular inequality (like K-Means do using Euclidean distance, not manhattan)
p.s. cos(90˚C)=0 leads to Curse of Dimensionality pointed here
P.P.S. be carefull: Euclidean distance (in K-means) gives misleading info about Outliers (k-medians seems better to use, k-medoids seems arguable)
P.P.P.S for classification tasks you can not care about distance [follow link] (care about best cross-validated result), just for clustering care about distance you choose for your algorithm

- 354
- 2
- 9