0

I have two 2D tensors, A and B. I would like to write a function find_indices(A, B) which returns a 1D tensor that contains the indices of rows in A which also appears in B. Also, the function should avoid using for loop for parallelization. For example:

import torch

A = torch.tensor([[1, 2, 3], [2, 3, 4], [3, 4, 5]]).cuda()
B = torch.tensor([[1, 2, 3], [2, 3, 6], [2, 5, 6], [3, 4, 5]]).cuda()

indices1 = find_indices(A, B)  # tensor([0, 2])
indices2 = find_indices(B, A)  # tensor([0, 3])

assert A[indices1].equal(B[indices2])

Assume that:

  • All the rows in A and B are unique.
  • Rows in A and B are both sorted. So the same two rows appear in the same order in A and B.
  • len(A) and len(B) are ~200k.

I have tried this method from https://stackoverflow.com/a/60494505/17495278:

values, indices = torch.topk(((A.t() == B.unsqueeze(-1)).all(dim=1)).int(), 1, 1)
indices = indices[values!=0]
# indices = tensor([0, 2])

It gives accurate answer for small scale input. But for my use case, it takes >100 GB memory and raises CUDA out of memory error. Is there another way to achieve this with reasonable memory cost (say under 1 GB)?

Eureka D
  • 1
  • 4

1 Answers1

0

Update: For anyone looking for the answer, this is the pure PyTorch, GPU-only method.

_, idx, counts = torch.cat([A, B], dim=0).unique(
    dim=0, return_inverse=True, return_counts=True)
mask = torch.isin(idx, torch.where(counts.gt(1))[0])
mask1 = mask[:len(A)]  # tensor([ True, False,  True], device='cuda:0')
mask2 = mask[len(A):]  # tensor([ True, False, False,  True], device='cuda:0')

assert A[mask1].equal(B[mask2])

This method gives True/False mask instead of indices, which is also sufficient for tensor indexing. If you really need numerical indices, they can be acquired by

indices1 = torch.arange(len(mask1))[mask1]  # tensor([0, 2])
indices2 = torch.arange(len(mask2))[mask2]  # tensor([0, 3])

Original answer (with numpy on CPU and GPU/CPU data transfer, can be 100x slower)

import numpy as np

A_npy = A.cpu().numpy()
B_npy = B.cpu().numpy()
nrows, ncols = A_npy.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
       'formats':ncols * [A_npy.dtype]}
_, indices1, indices2 = np.intersect1d(A_npy.view(dtype), B_npy.view(dtype),
                                       return_indices=True)

print(indices1)  # [0 2]
print(indices2)  # [0 3]
assert A[indices1].equal(B[indices2])

This method uses numpy.intersect1d and returns the result as a numpy.ndarray. PyTorch tensor supports indexing with numpy array anyway so this works for me. The only flaw now is the overhead of data transfer between GPU and CPU.

Reference: https://stackoverflow.com/a/8317403/17495278

Credit: @Toyo, @Joe

Eureka D
  • 1
  • 4