1

I have data that consists of an array of times, with 10 data points each second, and an array of intensity values corresponding to each time. So, for an example let's say that I have:

times = np.arange(0,100,0.1)
intensities = np.random.rand(len(times))

I want to see what the data will look like if I use a longer averaging time, so I want to create some bins, of, say 1 second, 5 seconds, and 10 seconds and average the intensity values in those new bins. What is the best way to do this in numpy? (Or other python package, but I'm assuming numpy/scipy has something for me.) I could use a for loop, but I'm hoping there is a better way. Thanks!

DanHickstein
  • 6,588
  • 13
  • 54
  • 90
  • 1
    If you're going to do anything other than trivial processing of these numbers, it's well worth your time to look at [pandas](http://pandas.pydata.org/pandas-docs/stable/api.html#standard-moving-window-functions), which is specifically designed for this problem domain – Mike Pennington Apr 11 '13 at 23:47
  • I'm generally fairly afraid of bears, but if these pandas are as well trained as you say they are, I'll be sure the enlist their help in the future. – DanHickstein Apr 12 '13 at 03:20

2 Answers2

7

You can calculate moving averages using convolve as mentioned on stackoverflow here.

from pylab import plot, show
import numpy as np

times = np.arange(0,100,0.1)
intensities = np.random.rand(len(times))

def window(size):
    return np.ones(size)/float(size)

plot(times,intensities,'k.')
plot(times,np.convolve(intensities,window(10),'same'),'r')
plot(times,np.convolve(intensities,window(100),'same'),'b')
show()

enter image description here

Community
  • 1
  • 1
mtadd
  • 2,495
  • 15
  • 18
  • This is nice, and note that you can use any window besides a flat one (using a gaussian(size) for example instead of ones(size)). But be warned about the **edge effects**, which might not be so obvious in certain data. – askewchan Apr 12 '13 at 03:11
  • 1
    Thanks for the great solution! I used plot(times[::10],np.convolve(intensities,window(10),'same')[::10],'r') to simulate what it would look like if I was really only collecting one data point for every 10 data points that I was before. Thanks again! And yes, as asewchan mentions, there are edge effects with the convolution. You can use 'valid' instead of 'same' in the np.convolve, but then the array will no longer be the same size as the times array. – DanHickstein Apr 12 '13 at 04:05
4

You could reshape the data to group it into groups of 10, 50, or 100. Then call the mean(axis=-1) method to take the average over the last axis (the axis of size 10, 50, or 100):

With this setup:

In [10]: import numpy as np

In [11]: times = np.linspace(0,100,1000)

In [12]: intensities = np.random.rand(len(times))

Here is the means of every 10 values:

In [13]: intensities.reshape(-1,10).mean(axis=-1)
Out[13]: <output omitted due to length>

means of every 50 values:

In [14]: intensities.reshape(-1,50).mean(axis=-1)
Out[14]: <output omitted due to length>

means of every 100 values:

In [15]: intensities.reshape(-1,100).mean(axis=-1)
Out[15]: 
array([ 0.50969463,  0.5095131 ,  0.52503152,  0.49567742,  0.52701341,
        0.53584475,  0.54808964,  0.47564486,  0.490907  ,  0.50293636])

arr.reshape(-1, 10) tells NumPy to reshape the array arr to have a shape with size 10 in the last axis. The -1 tells NumPy to give the first axis whatever size is necessary to fill the array.

Note that using reshape in this way requires that len(intensities) is evenly divisible by the size (e.g. 10, 50, 100) that you want to group by.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • This is a clever solution, but it only works when the window size is a factor of `len(intensities)` You'd have to cut off some data if `len(intensities)` is prime or has a more awkward set of factors, as in: `q = len(intensities)%n; intensities[:q].reshape...` – askewchan Apr 12 '13 at 03:28
  • Thanks for the solution! This one seems like it should be very quick if working with a large dataset. – DanHickstein Apr 12 '13 at 04:10
  • @Dan And in my comment above, of course you'd have to use `[:-q or None]` not just `[:q]`. Apparently I was asleep when I wrote my answer and comment – askewchan Apr 12 '13 at 13:10