How to create a seaborn clustermap based on rows and extract row labels?

Question

I have a dataframe with a list of items and associated values. Which metric and method is best for performing the clustering?

I want to create a seaborn clustermap (dendrogram Plus heatmap) from the list on the basis of rows only, map it (that is done as shown is code), but how can I get the list of items for each cluster or each protein with its cluster information. (similar to Extract rows of clusters in hierarchical clustering using seaborn clustermap, but only based on rows and not columns)
How do I determine which "method" and "metric" is best for my data?

data.csv example:

item,v1,v2,v3,v4,v5
A1,1,2,3,4,5
B1,2,4,6,8,10
C1,3,6,9,12,15
A1,2,3,4,5,6
B2,3,5,7,9,11
C2,4,7,10,13,16

My code for creating the clustermap:

import pandas as pd
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.cluster.hierarchy as sch

df = pd.read_csv('data.csv', index_col=0)
sns.clustermap(df, col_cluster=False, cmap="coolwarm", method='ward', metric='euclidean', figsize=(40,40))
plt.savefig('plot.pdf', dpi=300)

clustering is unsupervised, meaning there are metrics that tell you whether the clusters are stable or explain more variance, but in the end, it's quite subjective. It depends on your end goal and you yourself have to be clear about it. You can try different hierarchical clustering methods and provide the linkage using ```row_linkage=``` option in clustermap. — StupidWolf, Jun 27 '20 at 23:11
@StupidWolf Thank you very much, Is there any way that we can check which method works best on our data (kind of validation). — dar102, Jun 30 '20 at 10:06

ASH · Answer 1 · 2020-06-29T13:32:45.267

0

I just hacked this together. Is this what you want?

import pandas as pd
import numpy as np
import seaborn as sns

cars = {'item': ['A1','B1','C1','A1','B1','C1'],
        'v1': [1.0,2.0,3.0,2.0,3.0,4.0],
        'v2': [2.0,4.0,6.0,3.0,5.0,7.0],
        'v3': [3.0,6.0,9.0,4.0,7.0,10.0],
        'v4': [4.0,8.0,12.0,5.0,9.0,13.0],
        'v5': [5.0,10.0,15.0,6.0,11.0,16.0]
        }

df = pd.DataFrame(cars)
df

heatmap_data = pd.pivot_table(df, values=['v1','v2','v3','v4','v5'], 
                              index=['item'])
heatmap_data.head()
sns.clustermap(heatmap_data)

df = df.drop(['item'], axis=1)
g = sns.clustermap(df)

Also, check out links below for more info on this topic.

https://seaborn.pydata.org/generated/seaborn.clustermap.html

https://kite.com/python/docs/seaborn.clustermap

edited Jun 29 '20 at 13:32

answered Jun 27 '20 at 21:29

ASH

20,759
19
87
200

Thank you so much, but I'm looking for seaborn custermap :) – dar102 Jun 28 '20 at 06:21
Sorry, I saw clustering and I thought you were referring to something else. I just updated my answer. Hope that helps. The only thing that I couldn't' really understand is the 'extracting rows' comment you made. – ASH Jun 29 '20 at 13:33
Thank you for your efforts, But This much is already done in my code. I wanted is the clustermap based on rows only (also done in my code with option of "col_cluster=False,"). What I wanted is the list format of my items to which cluster they belong after they have been clustered based on rows. – dar102 Jun 30 '20 at 10:01

How to create a seaborn clustermap based on rows and extract row labels?

1 Answers1