I have a manifold learning / non-linear dimensionality reduction problem where I know distances between objects up to some threshold, and beyond that I just know that the distance is "far". Also, in some cases some of the distances might be missing. I am trying to use sklearn.manifold
in order to perform the task of finding a 1d representation. A natural representation would be to represent "far" distances an inf
and missing distances as nan
.
However, it seems that currently scikit-learn
does not support nan
and inf
values in distance matrices given to manifold learning functions in sklearn.manifold
, since I get ValueError: Array contains NaN or infinity
.
Is there a conceptual reason for this? Some methods seem to be especially suitable for inf
, e.g. non-metric MDS. Also I know that some implementations of these methods in other languages are able to handle missing/inf values.
Instead of using inf
I have considered setting "far" values to a very large number, but I am not sure how this will affect the results.
Update:
I dug in the code of sklearn.manifold.MDS._smacof_single()
and found a piece of code and a comment saying that "similarities with 0 are considered as missing values"
. Is this an undocumented way to specify missing-values? Does this work with all manifold functions?