Edit:
@Kevin IMO if you're training a network anyway, you should do this step with a fully connected layer. that said..
I have a non-vectorized solution if you want something to work with. Any solution will be memory intensive. On my laptop it works sorta fast for CIFAR sized gray images (32x32). Maybe the key step could be vectorized by someone clever.
First split a test array arr
into windows win
using skimage
. This is test data.
>>> import numpy as np
>>> from skimage.util.shape import view_as_windows as viewW
>>> arr = np.arange(20).reshape(5,4)
>>> win = viewW(arr, (3,3))
>>> arr # test data
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
>>> win[0,0]==arr[:3,:3] # it works.
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
Now to recombine, generate an output array out
with shape (5,4,6)
. 6
is the number of windows in win
and (5,4)
is arr.shape
. We will populate this array by one window in each slice along the -1
axis.
# the array to be filled
out = np.zeros((5,4,6)) # shape of original arr stacked to the number of windows
# now make the set of indices of the window corners in arr
inds = np.indices((3,2)).T.reshape(3*2,2)
# and generate a list of slices. each selects the position of one window in out
slices = [np.s_[i[0]:i[0]+3:1,i[1]:i[1]+3:1,j] for i,j in zip(inds,range(6))]
# this will be the slow part. You have to loop through the slices.
# does anyone know a vectorized way to do this?
for (ii,jj),slc in zip(inds,slices):
out[slices] = win[ii,jj,:,:]
Now the out
array contains all of the windows in their proper positions but separated into panes across the -1
axis. To extract your original array you can average all elements down this axis which do not contain zeros.
>>> out = np.true_divide(out.sum(-1),(out!=0).sum(-1))
>>> # this can't handle scenario where all elements in an out[i,i,:] are 0
>>> # so set nan to zero
>>> out = np.nan_to_num(out)
>>> out
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.],
[16., 17., 18., 19.]])
Can you think up a way to operate over an array of slices in a vectorized way?
All together:
def from_windows(win):
"""takes in an arrays of windows win and returns the original array from which they come"""
a0,b0,w,w = win.shape # shape of window
a,b = a0+w-1,b0+w-1 # a,b are shape of original image
n = a*b # number of windows
out = np.zeros((a,b,n)) # empty output to be summed over last axis
inds = np.indices((a0,b0)).T.reshape(a0*b0,2) # indices of window corners into out
slices = [np.s_[i[0]:i[0]+3:1,i[1]:i[1]+3:1,j] for i,j in zip(inds,range(n))] # make em slices
for (ii,jj),slc in zip(inds,slices): # do the replacement into out
out[slc] = win[ii,jj,:,:]
out = np.true_divide(out.sum(-1),(out!=0).sum(-1)) # average over all nonzeros
out = np.nan_to_num(out) # replace any nans remnant from np.alltrue(out[i,i,:]==0) scenario
return out # hope you've got ram
and the test:
>>> arr = np.arange(32**2).reshape(32,32)
>>> win = viewW(arr, (3,3))
>>> np.alltrue(arr==from_windows(win))
True
>>> %timeit from_windows(win)
6.3 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Practically speaking this is not going to be fast enough for you to train on