0

Consider a data frame as shown here.

import pandas as pd
data= [
    {'col1':'101', 'col2': '101', 'col3':'1321'},
    {'col1':'99', 'col2': '99', 'col3':'101'},
    {'col1':'21', 'col2': '23', 'col3':'99'},
    {'col1':'47', 'col2': '67', 'col3':'47'},
    {'col1':'1321', 'col2': '47', 'col3':'23'}
           ]
df = pd.DataFrame(data)

How can I calculate Jaccard similarity between each column and then plot it on a heatmap?

Doing something like this does not seem right.

df111 = df.to_numpy()
res = 1 - pdist(df111, 'jaccard')

1 Answers1

0

I was able to find the exact solution from another thread. How to compute jaccard similarity from a pandas dataframe

posting the solution from that thread. credit goes to ayhan

from sklearn.metrics.pairwise import pairwise_distances
import seaborn as sns

jac_sim = 1 - pairwise_distances(plot_df1.T, metric = "hamming")
jac_sim = pd.DataFrame(jac_sim, index=plot_df1.columns, columns=plot_df1.columns)
sns.heatmap(jac_sim)