Is it possible to speed up this loop in Python?

Question

The normal way to map a function in a numpy.narray like np.array[map(some_func,x)] or vectorize(f)(x) can't provide an index. The following code is just a simple example that is commonly seen in many applications.

dis_mat = np.zeros([feature_mat.shape[0], feature_mat.shape[0]])

for i in range(feature_mat.shape[0]):
    for j in range(i, feature_mat.shape[0]):
        dis_mat[i, j] = np.linalg.norm(
            feature_mat[i, :] - feature_mat[j, :]
        )
        dis_mat[j, i] = dis_mat[i, j]

Is there a way to speed it up?

Thank you for your help! The quickest way to speed up this code is this, using the function that @user2357112 commented about:

    from scipy.spatial.distance import pdist,squareform
    dis_mat = squareform(pdist(feature_mat))

@Julien's method is also good if feature_mat is small, but when the feature_mat is 1000 by 2000, then it needs nearly 40 GB of memory.

There is a package named `numba` which can compile python code and works with `numpy`. Another option is to fully vectorize your code fragment using only `numpy` matrix / array operations — Marco, Nov 30 '17 at 04:50
You don't need to compute the diagonal for one thing :) Start the j loop with i + 1. — Mad Physicist, Nov 30 '17 at 04:51
[Are you looking for this?](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html) — user2357112, Nov 30 '17 at 04:52
@Marco. numba sounds like overkill. This looks like it can be vectorized directly, or at least that's my hunch. My gut says to look at ufunc.outer. — Mad Physicist, Nov 30 '17 at 04:53
@MadPhysicist, I agree that it is better to vectorize. But I think `numba` answers better the question if it is possible to speed up a loop in python — Marco, Nov 30 '17 at 04:59
@MadPhysicist: Vectorizing it directly will most likely end up with very large intermediate arrays, though, larger than either the input or the output. — user2357112, Nov 30 '17 at 05:04
As an aside, the "normal way" to apply a function over an array should not be `map` or `vectorize`; those should be reserved for cases where it's just not possible to apply more efficient methods, or where you have to get something written fast and the performance hit won't be a problem. — user2357112, Nov 30 '17 at 05:09
user167122, the etiquette here is that you don't put answers into your question. Would you consider reverting your last edit to the question (revision 5)? — David Z, Nov 30 '17 at 09:59
If you consider user2357112's solution best, why did you accept Julien's answer? — Barmar, Nov 30 '17 at 20:31

user2357112 · Answer 1 · 2017-11-30T08:35:16.363

SciPy comes with a function specifically to compute the kind of pairwise distances you're computing. It's scipy.spatial.distance.pdist, and it produces the distances in a condensed format that basically only stores the upper triangle of the distance matrix, but you can convert the result to square form with scipy.spatial.distance.squareform:

from scipy.spatial.distance import pdist, squareform

distance_matrix = squareform(pdist(feature_mat))

This has the benefit of avoiding the giant intermediate arrays required with a direct vectorized solution, so it's faster and works on larger inputs. It loses the timing to an approach that uses algebraic manipulations to have dot handle the heavy lifting, though.

pdist also supports a wide variety of alternate distance metrics, if you decide you want something other than Euclidean distance.

# Manhattan distance!
distance_matrix = squareform(pdist(feature_mat, 'cityblock'))

# Cosine distance!
distance_matrix = squareform(pdist(feature_mat, 'cosine'))

# Correlation distance!
distance_matrix = squareform(pdist(feature_mat, 'correlation'))

# And more! Check out the docs.

score 14 · Accepted Answer · edited Dec 20 '17 at 23:35

14

You can create a new axis and broadcast:

dis_mat = np.linalg.norm(feature_mat[:,None] - feature_mat, axis=-1)

Timing:

feature_mat = np.random.rand(100,200)

def a():
    dis_mat = np.zeros([feature_mat.shape[0], feature_mat.shape[0]])
    for i in range(feature_mat.shape[0]):
        for j in range(i, feature_mat.shape[0]):
            dis_mat[i, j] = np.linalg.norm(
                feature_mat[i, :] - feature_mat[j, :]
            )
            dis_mat[j, i] = dis_mat[i, j]

def b():
    dis_mat = np.linalg.norm(feature_mat[:,None] - feature_mat, axis=-1)



%timeit a()
100 loops, best of 3: 20.5 ms per loop

%timeit b()
100 loops, best of 3: 11.8 ms per loop

edited Dec 20 '17 at 23:35

Peter Mortensen

30,738
21
105
131

answered Nov 30 '17 at 05:01

Julien

13,986
5
29
53

10

`feature_mat[:,None] - feature_mat` can be huge, though. You're likely to blow your RAM on reasonably large inputs. – user2357112 Nov 30 '17 at 05:07
3

The usual dilemma between speed and memory... :) – Julien Nov 30 '17 at 05:08
1

Thanks for your advice. The totally time reduce from 6363 ms to 967 ms. And it's really easy to understand and readable. – Stephen Wang Nov 30 '17 at 06:11

B. M. · Answer 3 · 2017-12-21T07:54:58.467

8

Factor what can be done, and use np.dot optimizations on k x k matrix, in little memory place (kxk):

def c(m): 
    xy=np.dot(m,m.T) # O(k^3)
    x2=y2=(m*m).sum(1) #O(k^2)
    d2=np.add.outer(x2,y2)-2*xy  #O(k^2)
    d2.flat[::len(m)+1]=0 # Rounding issues
    return np.sqrt(d2)  # O (k^2)

And for comparison:

def d(m):
   return  squareform(pdist(m))

Here are the 'time(it)' for a k*k initial matrices:

The two algorithms are O(k^3), but c(m) makes the O(k^3) part of the job through np.dot, the critical node of linear algebra which benefits of all optimizations (multicore and so on). pdist is just loops as seen in the source.

This explains the 15x factor for big arrays, even if pdist exploits the symmetry of the matrix by calculating only half of the terms.

edited Dec 21 '17 at 07:54

answered Nov 30 '17 at 07:58

B. M.

18,243
2
35
54

1

Oh yeah, this solution. I'm pretty sure I've seen it in another answer before, but I didn't remember it. No giant intermediates, and you get a BLAS matrix multiply for a good chunk of the work (most of it?). Performance should be pretty good. I don't remember whether it beat `pdist`, but I wouldn't be surprised either way; last time I checked, `pdist` wasn't anywhere near as optimized as a BLAS matrix multiply. – user2357112 Nov 30 '17 at 08:07
1

I might be remembering it from [this answer](https://stackoverflow.com/questions/37009647/compute-pairwise-distance-in-a-batch-without-replicating-tensor-in-tensorflow), but I feel like I saw a NumPy version. Anyway, my timings say it's competitive with `pdist`, winning on larger arrays but losing on smaller arrays. With a bit more optimization, it could probably beat `pdist` on smaller inputs. – user2357112 Nov 30 '17 at 08:17
(`squareform`ing the `pdist` output takes long enough to tie with this answer on the small inputs, though.) – user2357112 Nov 30 '17 at 08:20

score 2 · Answer 4 · edited Dec 20 '17 at 23:40

One way I thought of to avoid mixing NumPy and for loops would be to create an index array using a version of this index creator that allows for replacement:

import numpy as np
from itertools import product, chain
from scipy.special import comb

def comb_index(n, k):
    count = comb(n, k, exact=True, repetition=True)
    index = np.fromiter(chain.from_iterable(product(range(n), repeat=k)),
                        int, count=count*k)
    return index.reshape(-1, k)

Then, we simply take each of those array couples, compute the difference between them, reshape the resulting array, and take the norm of each of the rows of the array:

reshape_mat = np.diff(feature_mat[comb_index(feature_mat.shape[0], 2), :], axis=1).reshape(-1, feature_mat.shape[1])
dis_list = np.linalg.norm(reshape_mat, axis=-1)

Note that dis_list is simply a list of all of the n*(n+1)/2 possible norms. This runs at close to the same speed as the other answer for the feature_mat he provided, and when comparing the byte sizes of our largest sections,

(feature_mat[:,None] - feature_mat).nbytes == 16000000

while

np.diff(feature_mat[comb_index(feature_mat.shape[0], 2), :], axis=1).reshape(-1, feature_mat.shape[1]).nbytes == 8080000

For most inputs, mine uses only half the storage: still unoptimal, but a marginal improvement.

score 2 · Answer 5 · edited Dec 20 '17 at 23:42

Based on np.triu_indices, in case you really want to do this with pure NumPy:

s = feature_mat.shape[0]
i, j = np.triu_indices(s, 1)         # All possible combinations of indices
dist_mat = np.empty((s, s))          # Don't waste time filling with zeros
np.einsum('ii->i', dist_mat)[:] = 0  # When you can just fill the diagonal
dist_mat[i, j] = dist_mat[j, i] = np.linalg.norm(feature_mat[i] - feature_mat[j], axis=-1)
                                     # Vectorized version of your original process

The benefit of this method over broadcasting is that you can do it in chunks:

n = 10000000   # Based on your RAM available
for k in range (0, i.size, n):
    i_ = i[k: k + n]
    j_ = j[k: k + n]
    dist_mat[i_, j_] = dist_mat[j_, i_] = \
                     np.linalg.norm(feature_mat[i_] - feature_mat[j_], axis = -1)

score -1 · Answer 6 · edited Dec 20 '17 at 23:37

Let's begin by rewriting this in terms of a function:

dist(mat, i, j):
    return np.linalg.norm(mat[i, :] - mat[j, :])

size = feature_mat.shape[0]

for i in range(size):
    for j in range(size):
        dis_mat[i, j] = dist(feature_mat, i, j)

This can be rewritten in (a slightly more) vectorized form as:

v = [dist(feature_map, i, j) for i in range(size) for j in range(size)]
dist_mat = np.array(v).reshape(size, size)

Notice that we're still relying on Python rather than NumPy for some of the computation, but it's a step towards vectorization. Also notice that dist(i, j) is symmetric, so we could further reduce computations by approximately half. Perhaps considering:

v = [dist(feature_map, i, j) for i in range(size) for j in range(i + 1)]

Now the tricky bit is assigning these computed values to the correct elements in a dist_mat.

How fast this performs depends on the cost of computing dist(i, j). For small feature_mats, the cost of recomputing is not high enough to worry about this. But for large matrices, you definitely do not want to recompute.

Consider the result of your code, the v maybe a list? – Stephen Wang Nov 30 '17 at 06:19 — Stephen Wang, Nov 30 '17 at 06:19

Is it possible to speed up this loop in Python?

6 Answers6

Linked

Related