5

I have a 3000x50 feature vector matrix. I obtained a similarity matrix for this using sklearn.metrics.pairwise_distances as 'Similarity_Matrix'. Now I used networkx to create a graph using the similarity matrix generated in the previous step as G=nx.from_numpy_matrix(Similarity_Matrix). I want to perform spectral clustering on this graph G now but several google searches have failed to provide a decent example of scikit learn spectral clustering on this graph :( The official documentation shows how spectral clustering can be done on some image data which is highly unclear at least to a newbie like myself.

Can anyone give me a code sample for this or for graph cuts or graph partitioning using networkx, scikit learn etc.

Thanks a million!

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
user3641802
  • 51
  • 1
  • 2
  • 2
    `pairwise_distances` produces a *distance* matrix, but you need a *similarity* matrix (a kernel's Gram matrix). I don't see why you put NetworkX in the loop as well. – Fred Foo May 16 '14 at 12:41
  • Thank you for the reply. In order to perform spectral clustering, I have been told to convert data in the form of a graph. Hence, I am using networkx for that. Other ideas are most welcome. – user3641802 May 20 '14 at 12:36
  • Graphs are represented in scikit-learn as connectivity matrices. scikit-learn doesn't talk to NetworkX. – Fred Foo May 20 '14 at 13:10

1 Answers1

3

adj_matrix = nx.from_numpy_matrix will help you create an adjacency matrix which will be your affinity matrix. You need to feed this to scikit-learn like this: SpectralClustering(affinity = 'precomputed', assign_labels="discretize",random_state=0,n_clusters=2).fit_predict(adj_matrix)

If you don't have any similarity matrix, you can change the value of 'affinity' param to 'rbf' or 'nearest_neighbors'. An example below explains the entire Spectral Clustering pipeline:

import sklearn
import networkx as nx
import matplotlib.pyplot as plt

'''Graph creation and initialization'''
G=nx.Graph()
G.add_edge(1,2)  # default edge weight=1
G.add_edge(3,4,weight=0.2) #weight represents edge weight or affinity
G.add_edge(2,3,weight=0.9) 
G.add_edge("Hello", "World", weight= 0.6)

'''Matrix creation'''
adj_matrix = nx.to_numpy_matrix(G) #Converts graph to an adj matrix with adj_matrix[i][j] represents weight between node i,j.
node_list = list(G.nodes()) #returns a list of nodes with index mapping with the a 

'''Spectral Clustering'''
clusters = SpectralClustering(affinity = 'precomputed', assign_labels="discretize",random_state=0,n_clusters=2).fit_predict(adj_matrix)
plt.scatter(nodes_list,clusters,c=clusters, s=50, cmap='viridis')
plt.show()
Amogh Mishra
  • 1,088
  • 1
  • 16
  • 25