2

I have 2 tensors of unequal size

a = torch.tensor([[1,2], [2,3],[3,4]])
b = torch.tensor([[4,5],[2,3]])

I want a boolean array of whether each value exists in the other tensor without iterating. something like

a in b

and the result should be

[False, True, False]

as only the value of a[1] is in b

Adam Williams
  • 25
  • 1
  • 3

6 Answers6

2

I think it's impossible without using at least some type of iteration. The most succinct way I can manage is using list comprehension:

[i in b for i in a]

Checks for elements in b that are in a and gives [False, True, False]. Can also be reversed to get elements a in b [False, True].

AverageHomosapien
  • 609
  • 1
  • 7
  • 15
  • 1
    This can be simplified to `[i in b for i in a]` since the `in` keyword returns a boolean. – Dash Nov 25 '22 at 04:36
2

this should work

result = []
for i in a:
    try: # to avoid error for the case of empty tensors
        result.append(max(i.numpy()[1] == b.T.numpy()[1,i.numpy()[0] == b.T.numpy()[0,:]]))
    except:
        result.append(False)
result
yuri
  • 99
  • 8
1

Neither of the solutions that use tensor in tensor work in all cases for the OP. If the tensors contain elements/tuples that match in at least one dimension, the aforementioned operation will return True for those elements, potentially leading to hours of debugging. For example:

torch.tensor([2,5]) in torch.tensor([2,10]) # returns True
torch.tensor([5,2]) in torch.tensor([5,10]) # returns True

A solution for the above could be forcing the check for equality in each dimension, and then applying a Tensor Boolean add. Note, the following 2 methods may not be very efficient because Tensors are rather slow for iterating and equality checking, so converting to numpy may be needed for large data:

[all(torch.any(i == b, dim=0)) for i in a] # OR
[any((i[0] == b[:, 0]) & (i[1] == b[:, 1])) for i in a]

That being said, @yuri's solution also seems to work for these edge cases, but it still seems to fail occasionally, and it is rather unreadable.

0

If you need to compare all subtensors across the first dimension of a, use in:

>>> [i in b for i in a]
[False, True, False]
iacob
  • 20,084
  • 6
  • 92
  • 119
0

I recently also encountered this issue though my goal is to select those row sub-tensors not "in" the other tensor. My solution is to first convert the tensors to pandas dataframe, then use .drop_duplicates(). More specifically, for OP's problem, one can do:

import pandas as pd
import torch

tensor1_df = pd.DataFrame(tensor1)
tensor1_df['val'] = False
tensor2_df = pd.DataFrame(tensor2)
tensor2_df['val'] = True
tensor1_notin_tensor2 = torch.from_numpy(pd.concat([tensor1_df, tensor2_df]).reset_index().drop(columns=['index']).drop_duplicates(keep='last').reset_index().loc[np.arange(tensor1_df.shape[0])].val.values)
user48867
  • 141
  • 1
  • 9
0

Unless I've messed something up, an element-wise 'in' check which treats the rows or subtensors as elements can be done like this:

(b[:,None]==a).all(dim=-1).any(dim=0)

b[:,None] adds dimension to each "row" in 'b' such that it can be broadcast to be compared with each "row" of 'a' in the usual way by element. This provides 2 sub-tensors in the 0th dimension the same size of 'b' where the first sub-tensor is comparing b[0,0], b[1,0], and b[2,0] with a[0,0] and comparing b[0,1], b[1,1], and b[2,1] with a[0,1], and the second sub-tensor is similarly comparing b with a[1,0] and a[1,1].

So, in the last dimension, any sub-tensor of all True will be one where each of a[0] or a[1] was matched, and the application of .all(dim=-1) will effectively bring us to a[0] in b for the first element of the first dimension and a[1] in b for the second element of the first dimension.

Then to get to a in b simply .any(dim=0) to combine the two measures providing tensor([False, True, False]).

Mark Z.
  • 11
  • 1