91

I have a numpy array. I want to create a new array which is the average over every consecutive triplet of elements. So the new array will be a third of the size as the original.

As an example:

 np.array([1,2,3,1,2,3,1,2,3])

should return the array:

 np.array([2,2,2])

Can anyone suggest an efficient way of doing this? I'm drawing blanks.

AGN Gazer
  • 8,025
  • 2
  • 27
  • 45
user1654183
  • 4,375
  • 6
  • 26
  • 33

3 Answers3

152

If your array arr has a length divisible by 3:

np.mean(arr.reshape(-1, 3), axis=1)

Reshaping to a higher dimensional array and then performing some form of reduce operation on one of the additional dimensions is a staple of numpy programming.

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • 19
    Jaime - thank you, that is a very elegant way of doing things. Do you have any advice for where one can read about these so-called 'staples of numpy programming'? – user1654183 Apr 14 '13 at 21:55
  • 14
    if `arr` length is not divisible by 3, you can do something like: `arr = np.nanmean(np.pad(arr.astype(float), (0, 3 - arr.size%3), mode='constant', constant_values=np.NaN).reshape(-1, 3), axis=1)` – plong0 Jul 31 '17 at 10:01
  • 4
    That padding comment by @plong0 helped me, but to make it general so that it works even if your array is also divisible by 3, I had to add another mod to the padding sizes: `( 0, ((3 - arr.size%3) % 3) )`, or something like `( 0, 0 if arr.size % 3 == 0 else 3 - arr.size % 3 )` – Scott Staniewicz Oct 04 '18 at 17:32
  • 7
    For an array not necessarily divisible by 3, I used `np.mean(arr[:(len(arr)//3)*3].reshape(-1,3), axis=1)` which seems a lot simpler to me. I believe this will work for python2 and python3 – Chris Dec 17 '18 at 10:01
  • 2
    @Chris That is not the same, because it simply discard the data in the last group (if it is not a group of 3), whereas the solutions above also work on the remainder group. – bluenote10 Sep 15 '19 at 12:26
  • @bluenote10: I consider that a plus, not a negative. You don't really want to include values which have "less" averaging and (in my use case) should be discarded. – Chris Jan 18 '21 at 22:56
13

For googlers looking for a simple generalisation for arrays with multiple dimensions: the function block_reduce in the scikit-image module (link to docs).

It has a very simple interface to downsample arrays by applying a function such as numpy.mean, but can also use others (maximum, median, ...). The downsampling can be done by different factors for different axes by supplying a tuple with different sizes for the blocks. Here's an example with a 2D array; downsampling only axis 1 by 5 using the mean:

import numpy as np
from skimage.measure import block_reduce

arr = np.stack((np.arange(1,20), np.arange(20,39)))

# array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
#        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]])

arr_reduced = block_reduce(arr, block_size=(1,5), func=np.mean, cval=np.mean(arr))

# array([[ 3. ,  8. , 13. , 17.8],
#        [22. , 27. , 32. , 33. ]])

As it was discussed in the comments to the other answer: if the array in the reduced dimension is not divisible by block size, padding values are provided by the argument cval (0 by default).

L_W
  • 942
  • 11
  • 18
1

To apply the accepted answer to 2D array for each column/feature:

arr.reshape(-1, downsample_ratio, arr.shape[1]).mean(axis = 1)
meliksahturker
  • 922
  • 2
  • 11
  • 20