I have a dataframe with a list of items and associated values. Which metric and method is best for performing the clustering?
I want to create a seaborn clustermap (dendrogram Plus heatmap) from the list on the basis of rows only, map it (that is done as shown is code), but how can I get the list of items for each cluster or each protein with its cluster information. (similar to Extract rows of clusters in hierarchical clustering using seaborn clustermap, but only based on rows and not columns)
How do I determine which "method" and "metric" is best for my data?
data.csv example:
item,v1,v2,v3,v4,v5
A1,1,2,3,4,5
B1,2,4,6,8,10
C1,3,6,9,12,15
A1,2,3,4,5,6
B2,3,5,7,9,11
C2,4,7,10,13,16
My code for creating the clustermap:
import pandas as pd
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.cluster.hierarchy as sch
df = pd.read_csv('data.csv', index_col=0)
sns.clustermap(df, col_cluster=False, cmap="coolwarm", method='ward', metric='euclidean', figsize=(40,40))
plt.savefig('plot.pdf', dpi=300)