subsampling every nth entry in a numpy array

Question

I am a beginner with numpy, and I am trying to extract some data from a long numpy array. What I need to do is start from a defined position in my array, and then subsample every nth data point from that position, until the end of my array.

basically if I had

a = [1,2,3,4,1,2,3,4,1,2,3,4....]

I want to subsample this to start at a[1] and then sample every fourth point from there, to produce something like

b = [2,2,2.....]

behzad.nouri · Answer 1 · 2016-01-22T15:08:39.170

You can use numpy's slicing, simply start:stop:step.

>>> xs
array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
>>> xs[1::4]
array([2, 2, 2])

This creates a view of the the original data, so it's constant time. It'll also reflect changes to the original array and keep the whole original array in memory:

>>> a
array([1, 2, 3, 4, 5])
>>> b = a[::2]         # O(1), constant time
>>> b[:] = 0           # modifying the view changes original array
>>> a                  # original array is modified
array([0, 2, 0, 4, 0])

so if either of the above things are a problem, you can make a copy explicitly:

>>> a
array([1, 2, 3, 4, 5])
>>> b = a[::2].copy()  # explicit copy, O(n)
>>> b[:] = 0           # modifying the copy
>>> a                  # original is intact
array([1, 2, 3, 4, 5])

This isn't constant time, but the result isn't tied to the original array. The copy also contiguous in memory, which can make some operations on it faster.

thank you for warning about the reference/value copy. i would otherwise definitely fall into the trap — F.S., Jun 15 '17 at 18:43
a[::2].copy() solved my issue. With just a[::2] when I would import this as a numpy array into C using ctypes, I was getting almost garbage result (my array was read as if I never reduced it). This was also fixable if I would run it through numpy.clip() or explicitly copy each variable in a for loop. Not sure if this is a bug.. — VSB, Feb 23 '21 at 01:49

score 0 · Answer 2 · edited Aug 22 '22 at 20:33

Complementary to behzad.nouri's answer: If you want to control the number of final elements and ensure it's always fixed to a predefined value (rather than controlling a fixed step in between subsamples) you can use numpy's linspace method followed by integer rounding.

For example, with num_elements=4:

>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> choice = np.round(np.linspace(1, len(a)-1, num=4)).astype(int)
>>> a[choice]
array([ 2,  5,  7, 10])

Or, subsampling an array with final start/end points in general:

>>> import numpy as np
>>> np.round(np.linspace(0, len(a)-1, num=4)).astype(int)
array([0, 3, 6, 9])
>>> np.round(np.linspace(0, len(a)-1, num=15)).astype(int)
array([0, 1, 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 8, 9])

subsampling every nth entry in a numpy array

2 Answers2

Linked

Related