0

I have a matrix A of shape m x n and another smaller matrix B of shape k x n. I want to calculate the euclidean distance between the rows of A and B, generating a matrix C of shape m x k. I already have a function dist(row1, row2). This is trivial using loops, but is there a vectorized way to do this in NumPy?

I believe what I want can be translated to a custom matrix multiplication-like operation (if I transpose B), and this question seems to head in the same direction, but the best answer there rearrange the operations in order to achieve vectorization (I want to use my separate function dist(row1, row2)). The second answer uses a separate function, but it also use loops.

  • Hi, can you please include the current code you're using with few example rows for each matrix? – Guglie Mar 20 '20 at 14:34
  • Does this answer your question? [NumPy Broadcasting: Calculating sum of squared differences between two arrays](https://stackoverflow.com/questions/36241608/numpy-broadcasting-calculating-sum-of-squared-differences-between-two-arrays) – Guglie Mar 20 '20 at 15:20
  • @Guglie: thanks, but I was looking for a more readable solution. `einsum` looks like black magic. If that's the only solution, I rather use loops in my case. – guest22654018 Mar 20 '20 at 19:03
  • I guess you want to do something like that? https://stackoverflow.com/a/42994680/4045774 – max9111 Mar 22 '20 at 11:26

1 Answers1

0

Try the below which may help? If A is shape mxn and B is shape kxn, C should be shape mxk

C = np.linalg.norm([A[:,None,:]-B],axis=-1)
Nik P
  • 224
  • 1
  • 5
  • Do you mind to explain this? It doesn't seem very obvious. – guest22654018 Mar 20 '20 at 19:03
  • For euclidean distance between your rows, you need difference between each row of `A` &each row of `B` but since they have different shapes in 1st dimension (rows) but same in the trailing dim. (`n` columns), you make them [compatible](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) by making a dimension `1` using the `None` keyword. After the difference you get a `mxkxn` array, and using `np.linalg.norm` on last axis (-1) of shape `n` you get the euclidean distance. I suggest give it a try on your data or provide a sample input & desired output in your post to check ? – Nik P Mar 20 '20 at 19:34