1

I have a large numpy array, with dimensions [1]. I want to find out a sort of "group average". More specifically,

Let my array be [1,2,3,4,5,6,7,8,9,10] and let my group_size be 3. Hence, I will average the first three elements, the 4th to 6th element, the 7th to 9th element, and average the remaining elements (only 1 in this case to get - [2, 5, 8, 10]. Needless to say, I need a vectorized implementation.

Finally, my purpose is reducing the number of points in a noisy graph to smoothen out a general pattern having a lot of oscillation. Is there a correct way to do this? I would like the answer to both questions, in case they have a different answer. Thanks!

martianwars
  • 6,380
  • 5
  • 35
  • 44
  • 1
    To smooth out a noisy pattern, you more likely want a rolling average (i.e., average elements 0-2, then 1-3, then 2-4, etc., so that the averages overlap). The pandas library has such functionality built in. – BrenBarn Dec 15 '16 at 06:09
  • yes thank you, rolling average gives me a good result. Nevertheless, I'd love to know the first answer too! – martianwars Dec 15 '16 at 06:16

3 Answers3

3

A good smoothing function is the kernel convolution. What it does is it multiplies a small array in a moving window over your larger array.

Say you chose a standard smoothing kernel of 1/3 * [1,1,1] and apply it to an array (a kernel needs to be odd-numbered and normalized). Lets apply it to [1,2,2,7,3,4,9,4,5,6]:

The centre of the kernal to begin with is on the first 2. It then averages itself and its neighbours, then moves on. The result is this: [1.67, 3.67, 4.0, 4.67, 5.33, 5.67, 6.0, 5.0]

Note that the array is missing the first and last element.

You can do this with numpy.convolve, for example:

import numpy as np
a = np.array([[1,2,2,7,3,4,9,4,5,6]])
k = np.array([1,1,1])/3
smoothed = np.convolve(x, k, 'valid')

The effect of this is that your central value is smoothed with the values from its neighbours. You can change the convolution kernel by increasing it in size, 5 for example [1,1,1,1,1]/5, or give it a gaussian, which will stress the central members more than the outside ones. Read the wikipedia article.

EDIT

This works to get a block average as the question asks for:

import numpy as np

a = [1,2,3,4,5,6,7,8,9,10]
size = 3

new_a = []
i = 0
while i < len(a):
    val = np.mean(a[i:i+3])
    new_a.append(val)
    i+=size

print(new_a)

[2.0, 5.0, 8.0, 10.0]
Roman
  • 8,826
  • 10
  • 63
  • 103
3

To solve for the group averaging, listed below are two approaches.

Approach #1 : Bin-based summing and averaging

In [77]: a
Out[77]: array([74, 48, 92, 40, 35, 38, 20, 69, 82, 37])

In [78]: N = 3 # Window size

In [79]: np.arange(a.size)//N # IDs for binning with bincount
Out[79]: array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])

In [84]: np.bincount(np.arange(a.size)//N,a)/np.bincount(np.arange(a.size)//N)
Out[84]: array([ 71.33333333,  37.66666667,  57.        ,  37.        ])

Approach #2 : Slice and reshape based averaging

In [134]: limit0 = N*(a.size//N)

In [135]: out = np.zeros((a.size+N-1)//N)

In [136]: out[:limit0//N] = a[:limit0].reshape(-1,N).mean(1)

In [137]: out[limit0//N:] = a[limit0:].mean()

In [138]: out
Out[138]: array([ 71.33333333,  37.66666667,  57.        ,  37.        ])

To smoothen data, I might suggest using MATLAB's smooth function ported to NumPy that is essentially convolved averaging and should be similar to @Roman's post.

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

Really, really wish numpy.ma.MaskedArray.resize worked. It would allow a one-step answer to this question.

As it is

def groupAverage(arr,idx):
    rem=arr.size%idx
    if rem==0:
       return np.mean(arr.reshape(idx,-1),index=0) 
    else:
        newsize=arr//size+1
        averages=np.mean(arr.resize(idx,newsize),index=0)
        averages[-1]*=(idx/rem)
        return averages
Daniel F
  • 13,620
  • 2
  • 29
  • 55