I am trying to build a distance matrix with array of different lenghts. The distance metric is hausdorff distance which is suitable for this kind of operations. Nonetheless I cant find a way to build a distance matrix using the scipy.cdist
function.
I looked here for scipy cdist docs and here for hausdorff distance pip install traj-dist
and here similar question.
Now I can get the distance between two arrays with either scipy
or traj_dist
libraries.
import numpy as np
from scipy.spatial import distance
import traj_dist.distance as tdist
from scipy.spatial.distance import directed_hausdorff
from sklearn.metrics import pairwise_distances
# np.float64 needed for tdist import
arr1 = np.array([10,22,1,22,32,88],
dtype=np.float64).reshape(3,2)
arr2 = np.array([1,22,32,88,55,11,99,1233],
dtype=np.float64).reshape(4,2)
# measuring just for 1 array at the time works fine
tdist.hausdorff(array_of_arrays[0],array_of_arrays[1])
directed_hausdorff(array_of_arrays[0],array_of_arrays[1])
I can calculate a distance matrix with a nested for loop, but that is extremely slow when n_observation
is big.
n_observations = array_of_arrays.shape[0]
distance_matrix = np.zeros((n_observations, n_observations))
for i in range(n_observations):
for j in range(i + 1, n_observations):
dist = tdist.hausdorff(np.float64(array_of_arrays[i]),
np.float64(array_of_arrays[j]),
type_d='spherical')
distance_matrix[i, j] = dist
distance_matrix[j, i] = dist
But I cant get it to work on a bigger scale using scipy.cdist
.
array_of_arrays = np.array([arr1, arr2])
distance.cdist(array_of_arrays, array_of_arrays,
lambda x, y: tdist.hausdorff(x,y))
distance.cdist(array_of_arrays, array_of_arrays,
lambda x, y: directed_hausdorff(x,y))
The sklearn.metrics.pairwise_distance
does not work either
pairwise_distances(array_of_arrays, metric=tdist.hausdorff)
The question is: How to reshape the array_of_arrays
to use scipy.cdist
on it?
Bonus sub-question:If scipy.cdist
is not suited for such a task, what should I use to avoid the nested for loops and calculate a distance_matrix?