I've put different values into this function and observed the output. But I can't find a predictable pattern in what is being outputed.
Then I tried digging through the function itself, but its confusing because it can do a number of different calculations.
According to the Docs:
Compute the distance matrix from a vector array X and optional Y.
I see it returns a matrix of height and width equal to the number of nested lists inputted, implying that it is comparing each one.
But otherwise I'm having a tough time understanding what its doing and where the values are coming from.
Examples I've tried:
pairwise_distances([[1]], metric='correlation')
>>> array([[0.]])
pairwise_distances([[1], [1]], metric='correlation')
>>> array([[ 0., nan],
>>> [nan, 0.]])
# returns same as last input although input values differ
pairwise_distances([[1], [2]], metric='correlation')
>>> array([[ 0., nan],
>>> [nan, 0.]])
pairwise_distances([[1,2], [1,2]], metric='correlation')
>>> array([[0.00000000e+00, 2.22044605e-16],
>>> [2.22044605e-16, 0.00000000e+00]])
# returns same as last input although input values differ
# I incorrectly expected more distance because input values differ more
pairwise_distances([[1,2], [1,3]], metric='correlation')
>>> array([[0.00000000e+00, 2.22044605e-16],
>>> [2.22044605e-16, 0.00000000e+00]])
Computing correlation distance with Scipy
I don't understand where the sklearn 2.22044605e-16
value is coming from if scipy returns 0.0
for the same inputs.
# Scipy
import scipy
scipy.spatial.distance.correlation([1,2], [1,2])
>>> 0.0
# Sklearn
pairwise_distances([[1,2], [1,2]], metric='correlation')
>>> array([[0.00000000e+00, 2.22044605e-16],
>>> [2.22044605e-16, 0.00000000e+00]])
I'm not looking for a high level explanation but an example of how the numbers are calculated.