How make a vectorized approach for calculating pair-wise Manhattan/L1 distance between multi-dimensional arrays?

Question

Let's say that I have two arrays of size (4000, 3). What I'd like to do in a vectorized fashion is to calculate the L1/Manhattan distance from each vector of the first array to every vector in the second array, so that I'd end up with a (4000, 4000) array.

My current approach is based on splitting up the (4000, 3) into 3 separate arrays of (4000, 1) and doing broadcasting (similar to here: Python alternative for calculating pairwise distance between two sets of 2d points).

However, this approach doesn't really work if I have initial matrices that are different in size, for example, (4000, 4) or (4000, 5). Then my code would break, because it's assuming there are 3 channels.

Therefore, I'd appreciate any help on creating a generalized vectorized approach that can calculate pairwise L1 distances!

Think cdist should work regardless of number of channels. – Divakar Feb 15 '19 at 04:14 — Divakar, Feb 15 '19 at 04:14

score 1 · Accepted Answer · answered Feb 15 '19 at 04:40

You can do the whole thing using broadcasting (if I'm understanding what you are trying to do correctly). First compute the pairwise differences of the vectors (result is shape N,N,k), then compute the sum of the absolute values of each of these vectors.

N = 4000
k = 4

X = np.random.rand(N,k)
Y = np.random.rand(N,k)

Z = np.sum(np.abs(X[:,None]-Y[:]),axis=-1)

How make a vectorized approach for calculating pair-wise Manhattan/L1 distance between multi-dimensional arrays?

1 Answers1