1

How can I select a random window from a numpy array greater than 2 dimensions wherein the window is random with respect to 2 different dimensions?

I'd like to do something similar to the answer in this post but in 3 dimensions, not 2: Selecting Random Windows from Multidimensional Numpy Array Rows

Example of what I am trying to vectorize (i.e. I'm trying to avoid a for loop):

import random
import numpy as np

ls = []
m = 3 # sequence length
k = 8 #batch_size

np_3D_array = np.random.randint(0,100, size = (5,7,4)) #random 3D array

for ii in range(k):
  random_sheet = random.randint(0,np_3D_array.shape[0] - 1)
  random_row = random.randint(0, np_3D_array.shape[1] - m)
  ls.append(np_3D_array[random_sheet, random_row:random_row + m , :])

final_output = np.array(ls)

print(final_output.shape) #prints (8, 3, 4) to stdoout
Divakar
  • 218,885
  • 19
  • 262
  • 358
lightbox142
  • 142
  • 2
  • 5
  • 16

2 Answers2

0

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.

from skimage.util.shape import view_as_windows

w = view_as_windows(np_3D_array,(1,m,1))[...,0,:,0]
r1 = np.random.randint(0,np_3D_array.shape[0], k)
r2 = np.random.randint(0, np_3D_array.shape[1] - m + 1, k)
final_output = w[r1,r2].swapaxes(1,2)

Here, view_as_windows is a convenience function that helps us easily setup the sliding windows without messing around to setup as_strided function.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I believe r1 and r2 should be these instead because random.randint is inclusive for the larger number and np.random.randint is exclusive (if I understand your solution correctly?): r1 = np.random.randint(0,np_3D_array.shape[0], k) r2 = np.random.randint(0, np_3D_array.shape[1] - m + 1, k) – lightbox142 Oct 28 '19 at 20:57
  • @teter123f My bad, I thought `random.randint` was same as `np.random.randint`, but the former uses the second arg as inclusive. Post edited. Please check it out. – Divakar Oct 28 '19 at 21:06
  • mmm, I'm getting this error as I try this solution (idk if it's an issue with my input np_3D_array. The way I make np_3D_array is by reading csv files of shape a x b and appending each to a list and calling np.array(the list) to create a c x a x b ndarray.... "/usr/local/lib/python3.6/dist-packages/skimage/util/shape.py:246: RuntimeWarning: Cannot provide views on a non-contiguous input array without copying. warn(RuntimeWarning("Cannot provide views on a non-contiguous input " – lightbox142 Oct 30 '19 at 19:03
  • @teter123f For this solution to work, the input must be contiguous, which apparently isn't the case with your input it seems. So, one way to get around without knowing your source of input would be to make a copy of `np_3D_array` and then using it. Hence, use something like `ar = np_3D_array.copy()` and then using `ar` in place of `np_3D_array` in the posted code. – Divakar Oct 30 '19 at 19:06
  • I still get the non-contiguous error when I use np_3D_array.copy(). I was able to get rid of the error by using np.ascontiguousarray(np_3D_array). However, this solution seems to be actually slower for my particular use case. I need to look into why this vectorized format is - ironically - slower. (My np_3D_array is about 150 x 50000 x 10 and k - i.e. number of for loop passes - is about 3000 so .... very confused why it's slower) – lightbox142 Oct 31 '19 at 19:23
  • @teter123f Yeah that forcing a copy with explicit one or `np.ascontiguousarray` isn't helping. Let me ask you - How are you getting `np_3D_array`? – Divakar Nov 01 '19 at 14:09
  • Really simple honestly, I read 150 csv files of size 50000 x 10. After that I append one by one to a list. After, I use np.array(the list) to get np_3D_array of size 150 x 50000 x 10. – lightbox142 Nov 01 '19 at 19:12
0

Using my window_nd recipe from here

def sample_nd(arr, window_shape, axis, k = 1):
    windows = window_nd(arr, window = window_shape, axis = axis)
    windows = windows.reshape((-1,) + windows.shape[len(axis):])
    index = np.random.randint(0, windows.shape[0], k)
    return windows[index].squeeze()

sample_nd(np_3D_array, window_shape = (1, 3), axis = (0, 1), k = 8).shape

(8, 3, 4)

For clarity, there's a lot of edge cases not accounted for here that are in the original function (most especially not working with only one window/axis unless they are formatted as tuples).

Daniel F
  • 13,620
  • 2
  • 29
  • 55