Introduction
I would like to assess the similarity between two "bin counts" arrays (related to two histograms), by using the Matlab "pdist2" function:
% Input
bin_counts_a = [689 430 311 135 66 67 99 23 37 19 8 4 3 4 1 3 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1];
bin_counts_b = [569 402 200 166 262 90 50 16 33 12 6 35 49 4 12 8 8 2 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1];
% Visualize the two "bin counts" vectors as bars:
bar(1:length(bin_counts_a),[bin_counts_a;bin_counts_b])
% Calculation of similarities
cosine_similarity = 1 - pdist2(bin_counts_a,bin_counts_b,'cosine')
jaccard_similarity = 1 - pdist2(bin_counts_a,bin_counts_b,'jaccard')
% Output
cosine_similarity =
0.95473215802008
jaccard_similarity =
0.0769230769230769
Question
If the cosine similarity is close to 1, which means the two vectors are similar, shouldn't the jaccard similarity be closer to 1 as well?