3

I want to apply sklearn graph clustering algorithms but they don't accept input from networkx in .gexf format. What kind of library/transformations do I need to turn my .gexf graphs suitable for sklearn?

Eskapp
  • 3,419
  • 2
  • 22
  • 39
Mnemosyne
  • 1,162
  • 4
  • 13
  • 45
  • Which clustering algorithm do you want to apply? networkx has some clustering algorithms like k-nearest neighbors implemented https://networkx.github.io/documentation/networkx-1.9.1/reference/algorithms.html – Eskapp Apr 09 '19 at 19:40
  • I want to use Affinity Propagation and Spectral Clustering. Not the clustering algorithms included in the networkx library – Mnemosyne Apr 09 '19 at 21:04
  • 1
    @Mnemosyne have you looked here https://stackoverflow.com/questions/46258657/spectral-clustering-a-graph-in-python ? – hellpanderr Apr 09 '19 at 22:40

1 Answers1

3

Cluster algorithms accept either distance matrices, affinity matrices, or feature matrices. For example, kmeans would accept a feature matrix (say X of n points of m dimensions) and apply the Euclidean distance metric, while affinity propagation accepts an affinity matrix (i.e. a square matrix D of nxn dimensions) or a feature matrix (depending on the affinity parameter).

If you want to apply a sklearn (or just non-graph) cluster algorithm, you can extract adjacency matrices from networkx graphs.

A = nx.to_scipy_sparse_matrix(G)

I guess you should make sure, your diagonal is 1; do numpy.fill_diagonal(D, 1) if not.

This then leaves only applying the clustering algorithm:

from sklearn.cluster import AffinityPropagation


ap = AffinityPropagation(affinity='precomputed').fit(A)
print(ap.labels_)

You can also convert your adjacency matrix to a distance matrix if you want to apply other algorithms or even project your adjacency/distance matrix to a feature matrix.

To go through all of this would go too far, however, as for getting the distance matrix, if you have binary edges, you can do D = 1 - A; if you have weighted edges you could D = A.max() - A.

ben26941
  • 1,580
  • 14
  • 20