2

Let's say I want to select a value from a different column for each row. Then, I might do something like this:

a = np.arange(12).reshape(3, 4)
columns = np.array([1, 2, 0])
a[np.arange(a.shape[0]), columns]

It seems a bit 'ugly' to me to need to specify the entire range; moreover, even the arange call takes time:

%timeit np.arange(int(1e6))
1.03 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Is there a way to avoid using arange?

Generalizing the above question; how would one go about selecting not single values, but different adjacent sets of columns (each set of equal size) for each row? I would like to avoid creating many manual aranges, like so:

rows = np.array([0, 2])
start_values = np.array([0, 1])
window_length = 3
column_ranges = np.array(list(map(lambda j: np.arange(j, j + window_length), start_values)))

Right now, the only way I see to use the above column ranges is to index like so:

a[rows, :][:, column_ranges][np.arange(len(rows)), np.arange(len(rows)), :]

Ideally, I'd like to use a notation like a[:, columns] instead of a[np.arange(a.shape[0]), columns], and a[:, columns:columns + window_length] instead of a[rows, :][:, column_ranges][np.arange(len(rows)), np.arange(len(rows)), :].

vandenheuvel
  • 349
  • 2
  • 13
  • `a[:, columns]` should work? – Quang Hoang Nov 26 '19 at 17:04
  • ```a[:, columns]``` is used to select entire columns of an matrix, whereas I would like to select different columns for each row. – vandenheuvel Nov 26 '19 at 17:07
  • In the simpler case, described first, it concerns only a single value, so in that case, this is not a problem. In the more general case I described after, it concerns sets of equal length, so again, this is not a problem. I'll clarify the phrasing. – vandenheuvel Nov 26 '19 at 17:11

1 Answers1

1

We can get sliding windows and then index those with the start indices along the rows and cols for our desired output. To get those windows, we can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows. More info on use of as_strided based view_as_windows. This would be mostly inspired by this post.

from skimage.util.shape import view_as_windows

def windows_per_row_vas(arr, rows, cols, W):
    w = view_as_windows(a,(1,W))[...,0,:]
    return w[rows,cols]

If you want to get your hands dirty with a crude implementation using np.lib.stride_tricks.as_strided -

def windows_per_row_strided(arr, rows, cols, W):
    strided = np.lib.stride_tricks.as_strided 
    m,n = arr.shape
    s0,s1 = arr.strides
    windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
    return windows[rows, cols]

Why use views/strided?

Because the windows are simply views into input, hence no memory overhead. It's only at the final step, when getting the output, we need extra memory space to hold the required slices, which are required anyway.

Sample run -

In [9]: a
Out[9]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [10]: rows = np.array([0, 2])
    ...: start_values = np.array([0, 1])
    ...: window_length = 3

In [11]: windows_per_row_strided(a, rows, start_values, window_length)
Out[11]: 
array([[ 0,  1,  2],
       [ 9, 10, 11]])


In [29]: windows_per_row_vas(a, rows, start_values, window_length)
Out[29]: 
array([[ 0,  1,  2],
       [ 9, 10, 11]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • This answers my question well, thank you. The code is also fast. When I try to do `windows = np.lib.stride_tricks.as_strided(a, shape=(3, 4 - 3 + 1, 3), strides=(32, 8, 8)); windows[rows, start_values][0, 0] = 999` I notice that `a` is not modified; could you explain when a copy gets made? At the moment of assignment as a copy on write? – vandenheuvel Nov 26 '19 at 18:00
  • 1
    `windows[rows, start_values][0, 0]` becomes a copy and hence assigning to it doesn't go back to `a`. But `windows[rows, start_values]` is a view and so are `windows[rows[0], start_values[0]]`, etc. Hence, `windows[rows, start_values] = 99`, `windows[rows[0], start_values[0]] = 99`, etc. would work. – Divakar Nov 26 '19 at 18:10