I have a set of word phrases and I want to categorise them as given in the example below.
Example:
adaptive and intelligent educational system
adaptive and intelligent tutoring system
adaptive educational system
For a human it is easy to understand that the above mentioned 3 word phrases should come under one category.
Is there any easy way of doing it?
Currently, I am using affinity propagation clustering algorithm as follows using levenshtein distance.
words = np.asarray(words) #So that indexing with a list will work
lev_similarity = -1*np.array([[distance.levenshtein(w1,w2) for w1 in words] for w2 in words])
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed", damping=0.5)
affprop.fit(lev_similarity)
for cluster_id in np.unique(affprop.labels_):
exemplar = words[affprop.cluster_centers_indices_[cluster_id]]
cluster = np.unique(words[np.nonzero(affprop.labels_==cluster_id)])
cluster_str = ", ".join(cluster)
print(" - *%s:* %s" % (exemplar, cluster_str))
However, I did not get the desired outputs. Hence, please propose me a suitable approach to get my desired results.