I have two 2D tensors, A
and B
. I would like to write a function find_indices(A, B)
which returns a 1D tensor that contains the indices of rows in A
which also appears in B
. Also, the function should avoid using for
loop for parallelization. For example:
import torch
A = torch.tensor([[1, 2, 3], [2, 3, 4], [3, 4, 5]]).cuda()
B = torch.tensor([[1, 2, 3], [2, 3, 6], [2, 5, 6], [3, 4, 5]]).cuda()
indices1 = find_indices(A, B) # tensor([0, 2])
indices2 = find_indices(B, A) # tensor([0, 3])
assert A[indices1].equal(B[indices2])
Assume that:
- All the rows in
A
andB
are unique. - Rows in
A
andB
are both sorted. So the same two rows appear in the same order inA
andB
. len(A)
andlen(B)
are ~200k.
I have tried this method from https://stackoverflow.com/a/60494505/17495278:
values, indices = torch.topk(((A.t() == B.unsqueeze(-1)).all(dim=1)).int(), 1, 1)
indices = indices[values!=0]
# indices = tensor([0, 2])
It gives accurate answer for small scale input. But for my use case, it takes >100 GB memory and raises CUDA out of memory error. Is there another way to achieve this with reasonable memory cost (say under 1 GB)?