Have a csc_matrix sparse named as eventPropMatrix having datatype=float64 with shape=(13000,7).
Upon which I am applying following distance calculating function.
Here
eventPropMatrix.getrow(i).todense()==[[0. 0. 0. 0. 0. 0. 0.]]
eventPropMatrix.getrow(j).todense()==[[0. 0. 0. 0. 0. 0. 0.]]
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
epsim = scipy.spatial.distance.correlation(eventPropMatrix.getrow(i).todense(), eventPropMatrix.getrow(j).todense())
Here the scipy.spatial.distance.correlation is following:
def correlation(u, v, w=None, centered=True):
"""
Compute the correlation distance between two 1-D arrays.
The correlation distance between `u` and `v`, is
defined as
.. math::
1 - \\frac{(u - \\bar{u}) \\cdot (v - \\bar{v})}
{{||(u - \\bar{u})||}_2 {||(v - \\bar{v})||}_2}
where :math:`\\bar{u}` is the mean of the elements of `u`
and :math:`x \\cdot y` is the dot product of :math:`x` and :math:`y`.
Parameters
----------
u : (N,) array_like
Input array.
v : (N,) array_like
Input array.
w : (N,) array_like, optional
The weights for each value in `u` and `v`. Default is None,
which gives each value a weight of 1.0
Returns
-------
correlation : double
The correlation distance between 1-D array `u` and `v`.
"""
u = _validate_vector(u)
v = _validate_vector(v)
if w is not None:
w = _validate_weights(w)
if centered:
umu = np.average(u, weights=w)
vmu = np.average(v, weights=w)
u = u - umu
v = v - vmu
uv = np.average(u * v, weights=w)
uu = np.average(np.square(u), weights=w)
vv = np.average(np.square(v), weights=w)
dist = 1.0 - uv / np.sqrt(uu * vv)
return dist
Here I am having "nan" values as return value for most of the time as uu=0.0 and vv=0.0
My query is that for the 13000 rows this calculation takes too much time. It has been running for last 15+ hours (i5, 8th Gen, 4 core processor, 12Gb RAM, Ubuntu). Is there any way around for this humongous calculation. I am contemplating to Cythonize the code into C and then compile and run. Will this help, if does then how to do this???