I'm looking for an efficient way to segment numpy arrays into overlapping chunks. I know that numpy.lib.stride_tricks.as_strided
is probably the way to go, but I can't seem to wrap my head around its usage in a generalized function that works on arrays with arbitrary shape. Here are some examples for specific applications of as_strided
.
This is what I would like to have:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def segment(arr, axis, new_len, step=1, new_axis=None):
""" Segment an array along some axis.
Parameters
----------
arr : array-like
The input array.
axis : int
The axis along which to segment.
new_len : int
The length of each segment.
step : int, default 1
The offset between the start of each segment.
new_axis : int, optional
The position where the newly created axis is to be inserted. By
default, the axis will be added at the end of the array.
Returns
-------
arr_seg : array-like
The segmented array.
"""
# calculate shape after segmenting
new_shape = list(arr.shape)
new_shape[axis] = (new_shape[axis] - new_len + step) // step
if new_axis is None:
new_shape.append(new_len)
else:
new_shape.insert(new_axis, new_len)
# TODO: calculate new strides
strides = magic_command_returning_strides(...)
# get view with new strides
arr_seg = as_strided(arr, new_shape, strides)
return arr_seg.copy()
So I would like to specify the axis that is going to be cut into segments, the length of the segments and the step between them. Additionally, I'd like to pass the position where the new axis is inserted as a parameter. The only thing that's missing is the calculation of the strides.
I'm aware that this might not work this way directly with as_strided
, i.e. I might need to implement a subroutine that returns a strided view with step=1
and new_axis
in a fixed position and then slice with the desired step
and transpose afterwards.
Here's a piece of code that works, but is obviously quite slow:
def segment_slow(arr, axis, new_len, step=1, new_axis=None):
""" Segment an array along some axis. """
# calculate shape after segmenting
new_shape = list(arr.shape)
new_shape[axis] = (new_shape[axis] - new_len + step) // step
if new_axis is None:
new_shape.append(new_len)
else:
new_shape.insert(new_axis, new_len)
# check if the new axis is inserted before the axis to be segmented
if new_axis is not None and new_axis <= axis:
axis_in_arr_seg = axis + 1
else:
axis_in_arr_seg = axis
# pre-allocate array
arr_seg = np.zeros(new_shape, dtype=arr.dtype)
# setup up indices
idx_old = [slice(None)] * arr.ndim
idx_new = [slice(None)] * len(new_shape)
# get order of transposition for assigning slices to the new array
order = list(range(arr.ndim))
if new_axis is None:
order[-1], order[axis] = order[axis], order[-1]
elif new_axis > axis:
order[new_axis-1], order[axis] = order[axis], order[new_axis-1]
# loop over the axis to be segmented
for n in range(new_shape[axis_in_arr_seg]):
idx_old[axis] = n * step + np.arange(new_len)
idx_new[axis_in_arr_seg] = n
arr_seg[tuple(idx_new)] = np.transpose(arr[idx_old], order)
return arr_seg
Here's a test for the basic functionality:
import numpy.testing as npt
arr = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
arr_seg_1 = segment_slow(arr, axis=1, new_len=3, step=1)
arr_target_1 = np.array([[[0, 1, 2], [1, 2, 3]],
[[4, 5, 6], [5, 6, 7]],
[[8, 9, 10], [9, 10, 11]]])
npt.assert_allclose(arr_target_1, arr_seg_1)
arr_seg_2 = segment_slow(arr, axis=1, new_len=3, step=1, new_axis=1)
arr_target_2 = np.transpose(arr_target_1, (0, 2, 1))
npt.assert_allclose(arr_target_2, arr_seg_2)
arr_seg_3 = segment_slow(arr, axis=0, new_len=2, step=1)
arr_target_3 = np.array([[[0, 4], [1, 5], [2, 6], [3, 7]],
[[4, 8], [5, 9], [6, 10], [7, 11]]])
npt.assert_allclose(arr_target_3, arr_seg_3)
Any help would be greatly appreciated!