2

I am trying to implement a simple version of spectral clustering using the normalized (random walk) Laplacian matrix in Python. After testing my function with a toy dataset, I found that my Laplacian matrix has negative eigenvalues. Here is my spectral clustering code:

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import pairwise_kernels, euclidean_distances, pairwise_distances
from sklearn.neighbors import NearestNeighbors

def nlapl(W):
    Dinv = 1 / np.sum(W, axis=1)
    Id = np.eye(W.shape[0])
    W = np.multiply(Dinv, W.T).T
    return Id - W

def sc(X, n_clusters, gamma):
    W = pairwise_kernels(X, metric='rbf', gamma=gamma)
    L = nlapl(W)

    lambdas, vs = np.linalg.eigh(L)
    lambdas = lambdas[:n_clusters]
    vs = vs[:,:n_clusters]

    print("lambdas:")
    print(lambdas)

    kmeans = KMeans(n_clusters=n_clusters, init='k-means++', max_iter=300, n_init=20, random_state=0).fit(vs)
    return vs, kmeans

Here is my test code:

from sklearn.datasets.samples_generator import make_blobs

X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
vs, kmeans = sc(X, 4, 2)

The function successfully identifies the clusters:

plt.figure()
plt.scatter(X[:,0], X[:,1], c=y, alpha=0.7)
plt.title('True Clusters')

plt.figure()
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, alpha=0.7)
plt.title('Spectral Clustering')

True Clusters Spectral Clustering

However, the Laplacian matrix has negative eigenvalues:

lambdas:
[-0.03429643 -0.02670478 -0.01684407 -0.0073953 ]

I'm pretty sure that my problem is in nlapl because if I use the unnormalized laplacian D - W, the eigenvalues are [-4.96328563e-15 5.94245930e-03 1.15181852e-02 1.51614560e-01]. However, I'm having trouble figuring out where my calculation is wrong. Am I missing something obvious? Thank you in advance for any advice.

EDIT: Since my toy dataset has 4 well-separated clusters, the theoretical multiplicity of the zero eigenvalue of L should be 4. However, the apparently multiplicity of zero with the unnormalized Laplacian is 1. Admittedly, some of the purple datapoints (in True Clusters) are pretty close to the other clusters, so maybe this isn't completely unexpected?

YLZ
  • 21
  • 2
  • `np.multiply` performs the multiplication element-wise. Did you mean to use matrix multiplication instead? e.g. `W = (Dinv @ W.T).T`? – hilberts_drinking_problem May 01 '20 at 04:43
  • @hilberts_drinking_problem ```Dinv``` is a vector while ```W``` is a 2d matrix, so I believe ```np.multiply``` should broadcast to the columns. Rewriting as ```Dinv = np.diag(1 / np.sum(W, axis=1)); W = Dinv @ W``` did not change the results. – YLZ May 01 '20 at 04:49

0 Answers0