7

I looking to use the kmeans algorithm to cluster some data, but I would like to use a custom distance function. Is there any way I can change the distance function that is used by scikit-learn?

I would also settle for a different framework / module that would allow exchanging the distance function and can calculate the kmeans in parallel (I would like to speed up the calculation, which is a nice feature from scikit-learn)

Any suggestions?

Nils Ziehn
  • 4,118
  • 6
  • 26
  • 40

1 Answers1

3

You could try spectral clustering algorithm which allows you to input your own distance matrix (calculated as you like).

Its performance has nothing to envy to K-means on convex boundaries, but does also the job on non-convex problems (detects connectivity). See more here.

The good news is that spectral clustering is also implemented in scikit-learn.

Hope it helps.

gowithefloww
  • 2,211
  • 2
  • 20
  • 31