4

I am trying to use manhattan distance for SpectralClustering() in Sklearn. I am trying to set the affinity parameter to be manhattan, but getting the following error.

ValueError: Unknown kernel 'manhattan'

What is the proper kernel name should I use for it? Anyone can help? Basically, I want to use SpectralClustering to realize kmeans using manhattan distance metric.

Here the line of code for setting SpectralClustering():

clustering = SpectralClustering(n_clusters=10, affinity='manhattan', assign_labels="kmeans")
clustering.fit(X)
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
MC X
  • 337
  • 4
  • 16

3 Answers3

3

The official documentation on Spectral Clustering tells you that you can use anything supported by sklearn.metrics.pairwise_kernels. Unfortunately there is no pairwise kernel for the Manhattan distance yet.

If something alike suffices, you could use the linear distance like this:

clustering = SpectralClustering(n_clusters=10, affinity='linear', assign_labels="kmeans")
Felix
  • 1,837
  • 9
  • 26
  • Thanks feliks! I see... linear kernel is to compute the dot product of two vectors which shows the similarity of two vectors. This does the same thing as manhattan distance – MC X Apr 03 '19 at 18:56
1

Manhattan distance is not supported in sklearn.metrics.pairwise_kernels that is the reason for the ValueError.

From Documentation:

Valid values for metric are::
[‘rbf’, ‘sigmoid’, ‘polynomial’, ‘poly’, ‘linear’, ‘cosine’]

linear and manhattan distance metric are different, you could understand from the example mentioned here:

>>> import numpy as np
>>> from sklearn.metrics import pairwise_distances
>>> from sklearn.metrics.pairwise import pairwise_kernels
>>> X = np.array([[2, 3], [3, 5], [5, 8]])
>>> Y = np.array([[1, 0], [2, 1]])
>>> pairwise_distances(X, Y, metric='manhattan')
array([[ 4.,  2.],
       [ 7.,  5.],
       [12., 10.]])
>>> pairwise_kernels(X, Y, metric='linear')
array([[ 2.,  7.],
       [ 3., 11.],
       [ 5., 18.]])

Manhattan distance function is available under sklearn.metrics.pairwise_distance

Now, the simpler way to use manhattan distance measure with spectral cluster would be,

>>> from sklearn.cluster import SpectralClustering
>>> from sklearn.metrics import pairwise_distances
>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [1, 0],
...               [4, 7], [3, 5], [3, 6]])

>>> X_precomputed = pairwise_distances(X, metric='manhattan')
>>> clustering = SpectralClustering(n_clusters=2, affinity='precomputed', assign_labels="discretize",random_state=0)
>>> clustering.fit(X_precomputed)
>>> clustering.labels_
>>> clustering 
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
  • 1
    This also makes much sense. As far as I am concerning now, linear kernel just provides a similarity score for data pair, which is kind of similar to manhattan distance does. But your method can clearly demonstrate how to apply manhattan distance to SpectralClustering. Thanks! – MC X Apr 04 '19 at 04:59
  • Feel free to change the accepted answer, if you feel so. Since it can help other to know how to use manhattan distance. – Venkatachalam Apr 04 '19 at 05:34
  • [manhattan](http://scikit-learn.org/stable/modules/metrics.html) already appeared – JeeyCi May 31 '23 at 11:06
-1

The element of the precomputed matrix should be similarity rather than distance. You can use Gaussian Kernel to do this transformation

neo
  • 1