1

For computer vision training purposes, random cropping is often used as a data augmentation technique. At each iteration, a batch of random crops is generated and fed to the network being trained. This needs to be efficient, as it is done at each training iteration.

If the data has too many dimensions, random dimension selection might also be needed. Random frames can be selected in a video for example. The data can even have 4 dimensions (3 in space + time), or more.

How can one write an efficient generator of random views of lower dimension?

A very naïve version for getting 2D views from 3D data, and only one by one, could be:

import numpy as np
import numpy.random as nr

def views():
    # suppose `data` comes from elsewhere
    # data.shape is (n1, n2, n3)
    while True:
        drop_dim = nr.randint(0, 3)
        drop_dim_keep = nr.randint(0, shape[drop_dim])
        selector = np.zeros(shape, dtype=bool)
        if drop_dim == 0:
            selector[drop_dim_keep, :, :] = 1
        elif drop_dim == 1:
            selector[:, drop_dim_keep, :] = 1
        else:
            selector[:, :,  drop_dim_keep] = 1
        yield np.squeeze(data[selector])

A more elegant solution probably exists, where at least:

  • there is no ugly if/else on the randomly chosen dimension
  • views can take a batch_size integer argument and generate several views at once without a loop
  • the dimension of input/output data is not specified (e.g. can do 3D -> 2D as well as 4D -> 2D)
lejlot
  • 64,777
  • 8
  • 131
  • 164

1 Answers1

0

I tweaked your function to clarify what it's doing:

def views():
    # suppose `data` comes from elsewhere
    # data.shape is (n1, n2, n3)
    while True:
        drop_dim = nr.randint(0, 3)
        dropshape = list(shape[:])
        dropshape[drop_dim] -= 1
        drop_dim_keep = nr.randint(0, shape[drop_dim])
        print(drop_dim, drop_dim_keep)
        selector = np.ones(shape, dtype=bool)
        if drop_dim == 0:
            selector[drop_dim_keep, :, :] = 0
        elif drop_dim == 1:
            selector[:, drop_dim_keep, :] = 0
        else:
            selector[:, :,  drop_dim_keep] = 0
        yield data[selector].reshape(dropshape)

A small sample run:

In [534]: data = np.arange(24).reshape(shape)
In [535]: data
Out[535]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
In [536]: v = views()
In [537]: next(v)
2 1
Out[537]: 
array([[[ 0,  2,  3],
        [ 4,  6,  7],
        [ 8, 10, 11]],

       [[12, 14, 15],
        [16, 18, 19],
        [20, 22, 23]]])
In [538]: next(v)
0 0
Out[538]: 
array([[[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

So it's picking one of the dimensions, and for that dimension dropping one 'column'.

The main efficiency issue is whether it's returning a view or a copy. In this case it has to return a copy.

You are using a boolean mask to select the return, exactly the same as what np.delete does in this case.

In [544]: np.delete(data,1,2).shape
Out[544]: (2, 3, 3)
In [545]: np.delete(data,0,0).shape
Out[545]: (1, 3, 4)

So you could replace much of your interals with delete, letting it take care of generalizing the dimensions. Look at its code to see how it handles those details (It isn't short and sweet!).

def rand_delete():
    # suppose `data` comes from elsewhere
    # data.shape is (n1, n2, n3)
    while True:
        drop_dim = nr.randint(0, 3)
        drop_dim_keep = nr.randint(0, shape[drop_dim])
        print(drop_dim, drop_dim_keep)
        yield np.delete(data, drop_dim_keep, drop_dim)

In [547]: v1=rand_delete()
In [548]: next(v1)
0 1
Out[548]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]])
In [549]: next(v1)
2 0
Out[549]: 
array([[[ 1,  2,  3],
        [ 5,  6,  7],
        [ 9, 10, 11]],

       [[13, 14, 15],
        [17, 18, 19],
        [21, 22, 23]]])

Replace the delete with take:

def rand_take():
    while True:
        take_dim = nr.randint(0, 3)
        take_keep = nr.randint(0, shape[take_dim])
        print(take_dim, take_keep)
        yield np.take(data, take_keep, axis=take_dim)

In [580]: t = rand_take()
In [581]: next(t)
0 0
Out[581]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [582]: next(t)
2 3
Out[582]: 
array([[ 3,  7, 11],
       [15, 19, 23]])

np.take returns a copy, but the equivalent slicing does not

In [601]: data.__array_interface__['data']
Out[601]: (182632568, False)
In [602]: np.take(data,0,1).__array_interface__['data']
Out[602]: (180099120, False)
In [603]: data[:,0,:].__array_interface__['data']
Out[603]: (182632568, False)

A slicing tuple can be generated with expressions like

In [604]: idx = [slice(None)]*data.ndim
In [605]: idx[1] = 0
In [606]: data[tuple(idx)]
Out[606]: 
array([[ 0,  1,  2,  3],
       [12, 13, 14, 15]])

Various numpy functions that take an axis parameter construct an indexing tuple like this. (For example one or more of the apply... functions.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Hum, ok, I see what I did there: I messed up between the question (what I meant) and the code. The question is "keep only one random column", not "drop only one random column". My bad, editing... – Corentin Lapeyre Sep 23 '17 at 06:20
  • This does indeed solve my first point (more elegant). Any idea how to address my second with this: adding a "batch_size" argument? Also: as you pointed out, this is still making copies, which isn't optimal. It seems `np.take` and all [indexing operations](https://docs.scipy.org/doc/numpy/reference/routines.indexing.html#indexing-like-operations) behave this way. – Corentin Lapeyre Sep 23 '17 at 06:45
  • `np.take(data, 0, 1)` returns a copy, but `data[:,0,:]` is a view. – hpaulj Sep 23 '17 at 06:59
  • I haven't paid much attention to your `batch` desire. – hpaulj Sep 23 '17 at 07:06