When trying to calculate numpy corrcoef
cc = np.corrcoef(y, y2)
where shapes of y
and y2
are
<class 'tuple'>: (32383, 1)
My computer hangs. Not only interpreter but entire computer.
How can this be and how to fix?
When trying to calculate numpy corrcoef
cc = np.corrcoef(y, y2)
where shapes of y
and y2
are
<class 'tuple'>: (32383, 1)
My computer hangs. Not only interpreter but entire computer.
How can this be and how to fix?
According to documentation, by default "each row represents a variable, with observations in the columns". Having shape (32383, 1)
means 32383 variables with 1 observation for each, which is totally meaningless for correlation purposes. So, the computer behavior aside, the way to proceed is
cc = np.corrcoef(y, y2, rowvar=False)
indicating that your columns correspond to variables.
Computing the correlation coefficient between two shape (32383, 1)
arrays will result in a (32383, 32383)
array. Such an array will have size (32383 * 32383 * 8) / (1024 ** 3) = 7.8 Gb
. Considering that you likely have some overhead in computing the result, you likely need in the order of tens of Gb's to compute this.
What is probably happening is not that your computer hangs, but that the computation overflows your RAM and starts to perform paging, that is, uses the hard-drive to store the partial results. This is exceedingly slow and it will thus seem like your computer has hung. Given enough time, it should give a result.
If you instead want to compute the row-wise cross correlation (which has shape (1, 1)
), you can do:
cc = np.corrcoef(y, y2, rowvar=False)