0

I have a few lines of code where I have a torch tensor (200.000 x 512) and I want to calculate the cosine distance between each of those embeddings.

Right now, I'm doing:

similarity = (image_features.cpu() @ image_features.cpu().T).squeeze()

But it takes too long. Maybe matlab is much faster. Does anyone can tell me how to do it there? (Convert my tensor to .mat file + code)

1 Answers1

0

You can convert your torch tensor into a numpy array, then save that to a .mat with scipy's savemat().

from scipy.io import savemat

# to numpy array
arr = tensor.numpy()
# to .mat, needs second input to be a dict type
savemat("filename.mat", {'a': arr})

However, MATLAB isn't always faster than Python (Numpy); especially when doing matrix multiplications, since Numpy itself is also highly optimized. So I believe that trying the same matrix multiplication on MATLAB won't speed things up. In fact, it might be faster to do the calculation in pytorch if you have a GPU available.

Perhaps a more relevant question here is how to calculate the cosine similarity of large datasets?

mimocha
  • 1,041
  • 8
  • 18