0

Suppose I have a 2d (square) matrix and a function:

import numpy as np
data = np.random.rand(10000, 10000)

def func(v1, v2):
    n1, n2 = np.linalg.norm(v1), np.linalg.norm(v2)
    return(np.dot(v1, v2) / (n1 * n2))

I want to calculate 'func' for each pair of rows in 'data' and save it to an output matrix 'out'. So basically the equivalent of:

out = np.ndarray(data.shape)
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        out[i,j] = func(data[i, :], data[j, :])

Obviously the above is super slow and inefficient. What is the most optimal and (num)pythonic way of iterating through pairs (tuples in general) of rows in an array like this? Given 'func' is an arbitrary R^n x R^n -> R function.

num3ri
  • 822
  • 16
  • 20
Spine Feast
  • 235
  • 1
  • 11
  • Is `func` the actual function you are working with or it's just a demo one? – Divakar Aug 02 '19 at 14:35
  • @Divakar It's an example cosine-of-the-angle-between-two-vectors function. Wanted to use something more complicated than a dot product since that gets solved with np.inner(data, data) – Spine Feast Aug 02 '19 at 14:39
  • @Divakar although this one can be reduced to just that by scaling the vectors first of course – Spine Feast Aug 02 '19 at 14:52
  • [related question](https://stackoverflow.com/questions/17627219/whats-the-fastest-way-in-python-to-calculate-cosine-similarity-given-sparse-mat) – Tarifazo Aug 02 '19 at 15:08

1 Answers1

1

A more generic approach if you have a general function is to use np.fromiter (which is generally faster than a for loop):

import itertools
n = 4
data = np.random.random((n, n))

def func(tup):
    v1, v2 = tup
    n1, n2 = np.linalg.norm(v1), np.linalg.norm(v2)
    return(np.dot(v1, v2) / (n1 * n2))

out = np.fromiter(map(func, itertools.product(data, data)), np.float).reshape(n,n)

print(out)
>>array([[1.        , 0.57588563, 0.44980109, 0.93490176],
       [0.57588563, 1.        , 0.71004626, 0.6908402 ],
       [0.44980109, 0.71004626, 1.        , 0.68118222],
       [0.93490176, 0.6908402 , 0.68118222, 1.        ]])
Tarifazo
  • 4,118
  • 1
  • 9
  • 22