2

In the following code we calculate magnitudes of vectors between all pairs of given points. To speed up this operation in NumPy we can use broadcasting

import numpy as np
points = np.random.rand(10,3)

pair_vectors = points[:,np.newaxis,:] - points[np.newaxis,:,:]
pair_dists = np.linalg.norm(pair_vectors,axis=2).shape

or outer product iteration

it = np.nditer([points,points,None], flags=['external_loop'], op_axes=[[0,-1,1],[-1,0,1],None])
for a,b,c in it:
    c[...] = b - a
pair_vectors = it.operands[2]
pair_dists = np.linalg.norm(pair_vectors,axis=2)

My question is how could one use broadcasting or outer product iteration to create an array with the form 10x10x6 where the last axis contains the coordinates of both points in a pair (extension). And in a related way, is it possible to calculate pair distances using broadcasting or outer product iteration directly, i.e. produce a matrix of form 10x10 without first calculating the difference vectors (reduction).

To clarify, the following code creates the desired matrices using slow looping.

pair_coords = np.zeros(10,10,6)
pair_dists = np.zeros(10,10)
for i in range(10):
    for j in range(10):
        pair_coords[i,j,0:3] = points[i,:]
        pair_coords[i,j,3:6] = points[j,:]
        pair_dists[i,j] = np.linalg.norm(points[i,:]-points[j,:])

This is a failed attempt to calculate distanced (or apply any other function that takes 6 coordinates of both points in a pair and produce a scalar) using outer product iteration.

res = np.zeros((10,10))
it = np.nditer([points,points,res], flags=['reduce_ok','external_loop'], op_axes=[[0,-1,1],[-1,0,1],None])
for a,b,c in it: c[...] = np.linalg.norm(b-a)
pair_dists = it.operands[2]
liberias
  • 235
  • 1
  • 6
  • `pair_vectors` is 10x10x3. I don't understand what the 10x10x6 would contain. – hpaulj Nov 13 '16 at 03:55
  • 1
    This `norm` is: `np.sqrt(np.sum(pair_vectors**2,axis=2))`. It squares all differences; sums on the size 3 dimension and takes the sqrt. – hpaulj Nov 13 '16 at 03:55
  • 1
    `nditer` is a bit like the list `zip`. It coordinates the iteration over multiple arrays. It also goes beyond `zip` in handling `broadcasting`. But you have use it in `c` code (`cython`) to get any real speed improvements. – hpaulj Nov 13 '16 at 17:48
  • I ended up using loops in fortran and f2py, it's really fast. I thought NumPy was an adequate environment for post processing, but it turns out to be quite complicate to vectorize simple loop operations. I guess NumPy is useful when there is a straightforward way to vectorize you function. – liberias Nov 13 '16 at 23:29

1 Answers1

1

Here's an approach to produce those arrays in vectorized ways -

from itertools import product
from scipy.spatial.distance import pdist, squareform

N = points.shape[0]

# Get indices for selecting rows off points array and stacking them 
idx = np.array(list(product(range(N),repeat=2)))
p_coords = np.column_stack((points[idx[:,0]],points[idx[:,1]])).reshape(N,N,6)

# Get the distances for upper triangular elements. 
# Then create a symmetric one for the final dists array.
p_dists = squareform(pdist(points))

Few other vectorized approaches are discussed in this post, so have a look there too!

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • pdist is using loops in c, and it doesn't allow to take periodic boundaries into acount. – liberias Nov 13 '16 at 23:22
  • @liberias Sorry what boundaries are we talking about here? Could you elaborate? – Divakar Nov 13 '16 at 23:23
  • Say you want to calculate difference vectors and then apply periodic boundaries to them, meaning that if any of coordinates in the difference vector exceeds some max box size (boundary) then you need to either subtract or add the size of the box. This can be easily done with broadcasting and then using sign function and conditioning. I think it's not possible with pdist which renders it quite useless for me. But anyway pdist is just a double loop compiled in C which is then run in Python, it doesn't solve the problem using NumPy. – liberias Nov 13 '16 at 23:35