Well there are two practical solutions to the the problem of intelligent selection
of the number of centroids (k) in common use.
The first is to PCA your data, and the output from PCA--which is the
principal components (eigenvectors) and their cumulate contribution to the variation
observed in the data--obviously suggests an optimal number of centroids.
(E.g., if 95% of the variability in your data is explained by the first three principal
components, then k=3 is a wise choice for k-means.)
The second commonly used practical solution to intelligently estimate k is
is a revised implementation of the k-means algorithm, called k-means++. In essence,
k-means++ just differs from the original k-means by the additional of a pre-processing
step. During this step, the number and initial position of the centroids and estimated.
The algorithm that k-means++ relies on to do this is straightforward to understand and to implement in code. A good source for both is a 2007 Post in the LingPipe Blog, which offers an excellent
explanation of k-means++ as well as includes a citation to the original paper that
first introduced this technique.
Aside from providing an optimal choice for k, k-means++ is apparently superior to
the original k-means in both performance (roughly 1/2 processing time compared
with k-means in one published comparison) and accuracy (three orders of magnitude
improvement in error in the same comparison study).