9

What is the difference between K-means clustering and vector quantization?

They seem to be very similar.

I'm dealing with Hidden Markov Models and I need to extract symbols from feature vectors.

In order to extract symbols, do I do vector quantization or k-means clustering?

nbro
  • 15,395
  • 32
  • 113
  • 196
garak
  • 4,713
  • 9
  • 39
  • 56

2 Answers2

18

The way I understand it, K-means is one type of vector quantization.

Rob Neuhaus
  • 9,190
  • 3
  • 28
  • 37
  • 2
    Exactly. K-means clustering is one method for performing vector quantization. The centroids found through K-means are (using information theory terminology) the *symbols* or *codewords* for your *codebook*. To decode a vector, assign the vector to the centroid (or codeword) to which it is closest. – Steve Tjoa Jan 26 '12 at 19:37
  • Maybe you should explain why k-means is a type of vector quantization by explaining what vector quantization is. – nbro Jun 13 '20 at 17:06
5

The K-means algorithms is the specialization of the celebrated "Lloyd I" quantization algorithm to the case of empirical distributions. (cf. Lloyd)

The Lloyd I algorithm is proved to yield a sequence of quantizers with a decreasing quadratic distortion. However, except in the special case of one-dimensional log-concave distributions, it dos not always converge to a quadratic optimal quantizer. (There are local minimums for the quantization error, especially when dealing with empirical distribution i.e. for the clustering problem.)

A method that converges (always) toward an optimal quantizer is the so-called CLVQ algorithms, which also generalizes to the problem of more general L^p quantization. It is a kind of Stochastic Gradient method. (cf. Pagès)

There are also some approaches based on genetic algorithms. (cf. Hamida et al.), and/or classical optimization procedures for the one dimensional case that converge faster (Pagès, Printems).

oers
  • 18,436
  • 13
  • 66
  • 75
Quantize
  • 51
  • 1