I am trying to hierarchically cluster a 2D numpy array so that it looks good when I graph it as a correlation matrix in d3.js
.
My data looks like this:
[[ 1. 0.091 0.147 ..., -0.239 0.113 -0.012 ]
[ 0.091 1. -0.153 ..., -0.004 -0.244 -0.00520801]
[ 0.147 -0.153 1. ..., -0.157 0.013 0.133]
...,
[-0.239 -0.004 -0.157 ..., -0.265 -0.362 1. ]]
I calculated these as Pearson correlation coefficients between -1 and 1. As you can see, there is a 1 to 1 correlation down the diagonal from top left of the array to bottom right.
If I graph these values, my correlation matrix looks like this:
After clustering I want it to be somewhat similar to this, where the red colors represent postive correlations and blue represents negative correlations:
Using matplotlib and scipy, I can cluster the coefficients to look like a heatmap, however, the values are changed. I want my values to remain the same.
I used this answer to graph the heatmap in python, but its not quite what I want since it changes my values.. All I need is to cluster the data and output to a csv/json file.
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage, dendrogram
data_dist = pdist(final_correlation, 'correlation') # If I use this,
# it gives me an array that is half the size of my original correlation matrix. These are
# the distances. How do I use this to re-order my correlation matrix as a clustered matrix?
Out[1]: # The size is 9730, as opposed to the original size of 19,600
[ 0.612 0.503 1.653 ..., 0.792 1.577
0.829]
UPDATE
If anyone knows R
, the code I am trying to execute would probably look something like this