I have a dataframe with two classes 'yes' and 'no'. using scipy Hiererchical clustering I found 2 clusters. here is my code
from scipy.cluster.hierarchy import linkage, dendrogram
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import fcluster
Mdist_matrix = pdist(x_Minmax, metric= 'cityblock')
MSlink = linkage (Mdist_matrix , method = 'single' , metric = 'cityblock')
crsm = fcluster(MClink, k , criterion='maxclust')
arr = np.unique(crsm, return_counts = True)
# print(arr)
dfcluster= dfcluster.copy()
dfcluster['Clabels'] = pd.Series(crsm, index=dfcluster.index)
No = dfcluster[df['status'] == 0]['Clabels'].value_counts()
print("CNO\n",No)
Yes= dfcluster[df['status'] == 1]['Clabels'].value_counts()
print("Cyes\n",Yes)
The output looks like this one
I wanted to compute the entropy of each clusters and the purity of the the cluster.How can I compute the probability of 'yes' and 'no' in each of the clusters? I tried to do it in this way Fastest way to compute entropy in python but it is not clear to me.