4

Say I have a 2D pytorch tensor and a 2D numpy boolean as follows,

a = torch.tensor([[ 0.,  1.,  2.],
                  [ 3.,  4.,  5.],
                  [ 6.,  7.,  8.],
                  [ 9., 10., 11.],
                  [12., 13., 14.]])

m = numpy.array([[ False,  True, False],
                 [  True, False,  True],
                 [ False,  True,  True],
                 [ False, False, False],
                 [  True, False, False]])

They have the same dimension and the number of True's in each column of m is the same.

I need to get the 2x3 tensor that is

a.transpose(0,1).masked_select(torch.from_numpy(m.transpose())).reshape(a.shape[1],-1).transpose(0,1)

which is

tensor([[ 3.,  1.,  5.],
        [12.,  7.,  8.]])

The actual tensor is very large, and the operation needs to be performed many times. So I want to ask what is an efficient way of doing this (or the most efficient way).

jeffzz
  • 41
  • 2
  • why do you `transpose` everything back and forth? – Shai Mar 28 '22 at 05:18
  • 1
    because by default, I believe, the reshape function completes in the order of rows, i.e., the first row, then second... And, applying mask m is just the flattened order just like reshape, hence, 1. would be the very first element if a and m are not transposed, and that is not what i want. Each column has the same number of True's (not row), so I need to make it to a row by transposing it; and after applying the mask, and reshape, transpose it back. – jeffzz Mar 28 '22 at 05:25
  • Your implementation is not correct, try your sample array *m* with only 2 *True* in each column. Wrong shape and shuffled elements. `a.gather(0, torch.from_numpy(np.nonzero(m.T)[1].reshape(-1, m.shape[1], order='F')))` is a general solution, but uses slow `np.nonzero` call. – Michael Szczesny Mar 28 '22 at 06:13
  • You are absolutely right, I messed up in the example given above. It's now edited. – jeffzz Mar 28 '22 at 18:11
  • I have some questions, why 2*3, why not 3*2? and why 3 is the first element in your desired output however, 1 is in the first row and it should be `tensor([[ 1., 3., 5.], [ 7., 8., 12.]])` – Hamzah Mar 29 '22 at 19:24

2 Answers2

1

In my benchmarks a jitted numba solution is the fastest, I could find

My benchmarks for a, m with shape (10000,200)(equal result tensors)

1 @numba.jit 13.2 ms (3.46x)
2 list comprehension 31.3 ms (1.46x)
3 baseline 45.7 ms (1.00x)

Generation of sufficiently large sample data for benchmarking

import torch
import numpy as np

def generate_data(rows=500, columns=100):
    
    a = torch.from_numpy(np.random.uniform(1,10, (rows,columns)).astype(np.float32))

    # argsort trick by @divakar https://stackoverflow.com/a/55317373/14277722
    def shuffle_along_axis(a, axis):
        idx = np.random.rand(*a.shape).argsort(axis=axis)
        return np.take_along_axis(a,idx,axis=axis)

    m = shuffle_along_axis(np.full((columns,rows), np.random.randint(2, size=rows)), 1).astype('bool').T
    return a, np.ascontiguousarray(m)

a, m = generate_data(10000,200)

A jitted numba implementation

import numba as nb

@nb.njit
def gather2d(arr1, arr2):
    res = np.zeros((np.count_nonzero(arr2[:,0]), arr1.shape[1]), np.float32)
    counter = np.zeros(arr1.shape[1], dtype=np.intp)
    for i in range(arr1.shape[0]):
        for j in range(arr1.shape[1]):
            if arr2[i,j]:
                res[counter[j], j] = arr1[i,j]
                counter[j] += 1
    return res

torch.from_numpy(gather2d(a.numpy(),m))

Output

# %timeit 10 loops, best of 5: 13.2 ms per loop
tensor([[2.1846, 7.8890, 8.8218,  ..., 4.8309, 9.2853, 6.4404],
        [5.8842, 3.7332, 6.7436,  ..., 1.2914, 3.2983, 3.5627],
        [9.5128, 2.4283, 2.2152,  ..., 4.9512, 9.7335, 9.6252],
        ...,
        [7.3193, 7.8524, 9.6654,  ..., 3.3665, 8.8926, 4.7660],
        [1.3829, 1.3347, 6.6436,  ..., 7.1956, 4.0446, 6.4633],
        [6.4264, 3.6283, 3.6385,  ..., 8.4152, 5.8498, 5.0281]])

Against a vectorized baseline solution

# %timeit 10 loops, best of 5: 45.7 ms per loop
a.gather(0, torch.from_numpy(np.nonzero(m.T)[1].reshape(-1, m.shape[1], order='F')))

A python list comprehension turns out to be surprisingly fast

def g(arr1,arr2):
    return np.array([i[j] for i,j in zip(arr1.T,arr2.T)]).T

# %timeit 10 loops, best of 5: 31.3 ms per loop
torch.from_numpy(g(a.numpy(), m))
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32
  • Thanks for the answer! But the tensor has grad, so I cannot convert it to numpy and convert it back. But thank you for the idea of using numba! – jeffzz Mar 31 '22 at 20:24
0

You can try this way by using only NumPy and PyTorch:

b,c = m.nonzero()
b = torch.tensor(b)
c = torch.tensor(c)
a[b,c].reshape(2,3)

#output

tensor([[ 1.,  3.,  5.],
        [ 7.,  8., 12.]])   # True values are taken on axis=1

I used the same example provided by @Michael Szczesny to measure the time:

import numpy as np
import timeit
import torch

rows, columns = (10000,200)
a = torch.from_numpy(np.random.uniform(1,10, (rows,columns)).astype(np.float32))
m = np.random.choice([False, True], size=(rows, columns))

starttime = timeit.default_timer()
b,c = m.nonzero()
b = torch.tensor(b)
c = torch.tensor(c)
a[b,c]
print(f"The time difference is :{(timeit.default_timer() - starttime)*1000} ms")

#output
The time difference is : 26.4 ms

It is better than the second and third approaches of @Michael Szczesny.

Hamzah
  • 8,175
  • 3
  • 19
  • 43
  • Thanks for the idea! This method actually turns out to be slower than transposing everything back and forth in my actual work. I guess it's because the `m` I have is very sparse, approximately 5 million rows, and each column has about 1,000 Trues. – jeffzz Mar 31 '22 at 20:27