2

I have a list of strings and I want to classify it by using clustering in Python.

list = ['String1', 'String2', 'String3',...]

I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be found this way:

jellyfish.levenshtein_distance('string1', 'string2')

My problem is that I don't know how to use scipy.cluster.hierarchy to get a list in Python of each cluster. I have also tried using linkage function:

linkage(y[, method, metric])

But I am not able to get the final list with clusters.

bad_coder
  • 11,289
  • 20
  • 44
  • 72
Muny
  • 87
  • 2
  • 7
  • 1
    Have a look here: http://stackoverflow.com/questions/21638130/tutorial-for-scipy-cluster-hierarchy – tfv Apr 30 '16 at 05:38

1 Answers1

0

After using linkage for implementing hierarchical clustering on the distance you have, you should use cluster.hierarchy.cut_tree to cut the tree. If you want two clusters:

cluster.hierarchy.cut_tree(linkage_output,2).ravel() #.ravel makes it 1D array.
MattDMo
  • 100,794
  • 21
  • 241
  • 231
Hadij
  • 3,661
  • 5
  • 26
  • 48