I have a text corpus that contains 1000+ articles each in a separate line. I used Hierarchy Clustering using Sklearn in python to produce clusters of related articles. This is the code I used to do the clustering
Note: X, is a sparse NumPy 2D array with rows corresponding to documents and columns corresponding to terms
# Agglomerative Clustering
from sklearn.cluster import AgglomerativeClustering
model = AgglomerativeClustering(affinity="euclidean",linkage="complete",n_clusters=3)
model.fit(X.toarray())
clustering = model.labels_
print (clustering)
I specify the number of clusters = 3 at which to cut off the tree to get a flat clustering like K-mean
My question is : How to get the top N frequent words in each cluster? so that I can suggest a topic for each cluster. Thanks