146

I am a beginner with numpy, and I am trying to extract some data from a long numpy array. What I need to do is start from a defined position in my array, and then subsample every nth data point from that position, until the end of my array.

basically if I had

a = [1,2,3,4,1,2,3,4,1,2,3,4....] 

I want to subsample this to start at a[1] and then sample every fourth point from there, to produce something like

b = [2,2,2.....]
behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
Rich Williams
  • 1,593
  • 2
  • 11
  • 9

2 Answers2

236

You can use numpy's slicing, simply start:stop:step.

>>> xs
array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
>>> xs[1::4]
array([2, 2, 2])

This creates a view of the the original data, so it's constant time. It'll also reflect changes to the original array and keep the whole original array in memory:

>>> a
array([1, 2, 3, 4, 5])
>>> b = a[::2]         # O(1), constant time
>>> b[:] = 0           # modifying the view changes original array
>>> a                  # original array is modified
array([0, 2, 0, 4, 0])

so if either of the above things are a problem, you can make a copy explicitly:

>>> a
array([1, 2, 3, 4, 5])
>>> b = a[::2].copy()  # explicit copy, O(n)
>>> b[:] = 0           # modifying the copy
>>> a                  # original is intact
array([1, 2, 3, 4, 5])

This isn't constant time, but the result isn't tied to the original array. The copy also contiguous in memory, which can make some operations on it faster.

behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
  • 3
    thank you for warning about the reference/value copy. i would otherwise definitely fall into the trap – F.S. Jun 15 '17 at 18:43
  • a[::2].copy() solved my issue. With just a[::2] when I would import this as a numpy array into C using ctypes, I was getting almost garbage result (my array was read as if I never reduced it). This was also fixable if I would run it through numpy.clip() or explicitly copy each variable in a for loop. Not sure if this is a bug.. – VSB Feb 23 '21 at 01:49
0

Complementary to behzad.nouri's answer: If you want to control the number of final elements and ensure it's always fixed to a predefined value (rather than controlling a fixed step in between subsamples) you can use numpy's linspace method followed by integer rounding.

For example, with num_elements=4:

>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> choice = np.round(np.linspace(1, len(a)-1, num=4)).astype(int)
>>> a[choice]
array([ 2,  5,  7, 10])

Or, subsampling an array with final start/end points in general:

>>> import numpy as np
>>> np.round(np.linspace(0, len(a)-1, num=4)).astype(int)
array([0, 3, 6, 9])
>>> np.round(np.linspace(0, len(a)-1, num=15)).astype(int)
array([0, 1, 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 8, 9])
xjcl
  • 12,848
  • 6
  • 67
  • 89
Gabriele
  • 19
  • 4