Selecting Random Windows from Multidimensional Numpy Array Rows

Question

I have a large array where each row is a time series and thus needs to stay in order.

I want to select a random window of a given size for each row.

Example:

>>>import numpy as np
>>>arr = np.array(range(42)).reshape(6,7)
>>>arr
array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41]])
>>># What I want to do:
>>>select_random_windows(arr, window_size=3)
array([[ 1,  2,  3],
       [11, 12, 13],
       [14, 15, 16],
       [22, 23, 24],
       [38, 39, 40]])

What an ideal solution would look like to me:

def select_random_windows(arr, window_size):
    offsets = np.random.randint(0, arr.shape[0] - window_size, size = arr.shape[1])
    return arr[:, offsets: offsets + window_size]

But unfortunately this does not work

What I'm going with right now is terribly slow:

def select_random_windows(arr, wndow_size):
    result = []
    offsets = np.random.randint(0, arr.shape[0]-window_size, size = arr.shape[1])
    for row, offset in enumerate(start_indices):
        result.append(arr[row][offset: offset + window_size])
    return np.array(result)

Sure, I could do the same with a list comprehension (and get a minimal speed boost), but I was wondering wether there is some super smart numpy vectorized way to do this.

score 9 · Accepted Answer · answered Dec 26 '17 at 19:40

Here's one leveraging np.lib.stride_tricks.as_strided -

def random_windows_per_row_strided(arr, W=3):
    idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0])
    strided = np.lib.stride_tricks.as_strided 
    m,n = arr.shape
    s0,s1 = arr.strides
    windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
    return windows[np.arange(len(idx)), idx]

Runtime test on bigger array with 10,000 rows -

In [469]: arr = np.random.rand(100000,100)

# @Psidom's soln
In [470]: %timeit select_random_windows(arr, window_size=3)
100 loops, best of 3: 7.41 ms per loop

In [471]: %timeit random_windows_per_row_strided(arr, W=3)
100 loops, best of 3: 6.84 ms per loop

# @Psidom's soln
In [472]: %timeit select_random_windows(arr, window_size=30)
10 loops, best of 3: 26.8 ms per loop

In [473]: %timeit random_windows_per_row_strided(arr, W=30)
100 loops, best of 3: 9.65 ms per loop

# @Psidom's soln
In [474]: %timeit select_random_windows(arr, window_size=50)
10 loops, best of 3: 41.8 ms per loop

In [475]: %timeit random_windows_per_row_strided(arr, W=50)
100 loops, best of 3: 10 ms per loop

Thanks for that link. It's what I've been missing this whole time. Now I can finally add an implementation of weighed selection to numpy and enable bin width estimators for weighed histograms, a feature that's been bugging me for a while now. — Mad Physicist, Jan 03 '18 at 03:35
Fantastic answer, and a very clever use of use of as_strided. — user2699, Mar 01 '18 at 17:04
Can you provide an example of random_windows_per_row_strided with an N-dimensional array (for example, an array of size (w,x,y,z) in any dimension (whether it be x, or z, or any of the others)? I had the exact same question as OP but with a larger dimensional array. — lightbox142, Oct 25 '19 at 22:44

score 5 · Answer 2 · answered Dec 26 '17 at 19:33

5

In the return statement, change the slicing to advanced indexing, also you need to fix the sampling code a little bit:

def select_random_windows(arr, window_size):
    offsets = np.random.randint(0, arr.shape[1]-window_size+1, size=arr.shape[0])
    return arr[np.arange(arr.shape[0])[:,None], offsets[:,None] + np.arange(window_size)]

select_random_windows(arr, 3)
#array([[ 4,  5,  6],
#       [ 7,  8,  9],
#       [17, 18, 19],
#       [25, 26, 27],
#       [31, 32, 33],
#       [39, 40, 41]])

answered Dec 26 '17 at 19:33

Psidom

209,562
33
339
356

Can you provide an example of random_windows_per_row_strided with an N-dimensional array (for example, an array of size (w,x,y,z) in any dimension (whether it be x, or z, or any of the others)? I had the exact same question as OP but with a larger dimensional array (hence the title of the entire post). – lightbox142 Oct 25 '19 at 23:22

Selecting Random Windows from Multidimensional Numpy Array Rows

Example:

What an ideal solution would look like to me:

What I'm going with right now is terribly slow:

2 Answers2

Linked

Related