Plotting moving average with numpy and csv

Question

I need help plotting a moving average on top of the data I am already able to plot (see below)

I am trying to make m (my moving average) equal to the length of y (my data) and then within my 'for' loop, I seem to have the right math for my moving average.

What works: plotting x and y

What doesn't work: plotting m on top of x & y and gives me this error

RuntimeWarning: invalid value encountered in double_scalars

My theory: I am setting m to np.arrays = y.shape and then creating my for loop to make m equal to the math set within the loop thus replacing all the 0's to the moving average

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import csv
import math


def graph():
    date, value = np.loadtxt("CL1.csv", delimiter=',', unpack=True,
                         converters = {0: mdates.strpdate2num('%d/%m/%Y')})
    fig = plt.figure()

    ax1 = fig.add_subplot(1,1,1, axisbg = 'white')

    plt.plot_date(x=date, y=value, fmt = '-')

    y = value
    m = np.zeros(y.shape)
    for i in range(10, y.shape[0]):
       m[i-10] = y[i-10:1].mean()

    plt.plot_date(x=date, y=value, fmt = '-', color='g')
    plt.plot_date(x=date, y=m, fmt = '-', color='b')

    plt.title('NG1 Chart')
    plt.xlabel('Date')
    plt.ylabel('Price')

    plt.show()

graph ()

lmjohns3 · Answer 1 · 2013-08-19T14:46:48.277

The problem here lives in your computation of the moving average -- you just have a couple of off-by-one problems in the indexing !

y = value
m = np.zeros(y.shape)
for i in range(10, y.shape[0]):
   m[i-10] = y[i-10:1].mean()

Here you've got everything right except for the :1]. This tells the interpreter to take a slice starting at whatever i-10 happens to be, and ending just before 1. But if i-10 is larger than 1, this results in the empty list ! To fix it, just replace 1 with i.

Additionally, your range needs to be extended by one at the end. Replace y.shape[0] with y.shape[0]+1.

Alternative

I just thought I'd mention that you can compute the moving average more automatically by using np.convolve (docs) :

m = np.convolve(y, [1. / 10] * 10, 'same')

In this case, m will have the same length as y, but the moving average values might look strange at the beginning and end. This is because 'same' effectively causes y to be padded with zeros at both ends so that there are enough y values to use when computing the convolution.

If you'd prefer to get only moving average values that are computed using values from y (and not from additional zero-padding), you can replace 'same' with 'valid'. In this case, as Ryan points out, m will be shorter than y (more precisely, len(m) == len(y) - len(filter) + 1), which you can address in your plot by removing the first or last elements of your date array.

lmjohns3! I really appreciate you pointing that out. I believe that the math is still wrong based on the way the moving average is produced. Ex: my goal is to take an equal average of the past 10 data points to plot its average on the 11th data point, if that makes any sense — antonio_zeus, Aug 19 '13 at 00:15
also, I believe the math is wrong because when I see the graph, moving average drops off at the end, as if its calculating the average forward rather than backward — antonio_zeus, Aug 19 '13 at 01:26
@user2692787 yes, that's because the convolution is effectively padding your input signal with zeros when the `'same'` argument is used. I've updated the description. Read the docs for more examples and info, too. — lmjohns3, Aug 19 '13 at 14:47

score 1 · Answer 2 · edited May 23 '17 at 10:31

I think that lmjohns3 answer is correct, but you have a couple of problems with your moving average function. First of all, there is the indexing problem the lmjohns3 pointed out. Take the following data for example:

In [1]: import numpy as np

In [2]: a = np.arange(10)

In [3]: a
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Your function gives the following moving average values:

In [4]: for i in range(3, a.shape[0]):
   ...:     print a[i-3:i].mean(),
1.0 2.0 3.0 4.0 5.0 6.0 7.0

The size of this array (7) is too small by one number. The last value in the moving average should be (7+8+9)/3=8. To fix that you could change your function as follows:

In [5]: for i in range(3, a.shape[0] + 1):
    ...:     print a[i-3:i].sum()/3,
1 2 3 4 5 6 7 8

The second problem is that in order to plot two sets of data, the total number of data points needs to be the same. Your function returns a new set of data that is smaller than the original data set. (You maybe didn't notice because you preassigned a zeros array of the same size. Your for loop will always produce an array with a bunch of zeros at the end.)

The convolution function gives you the correct data, but it has two extra values (some at each end) because of the same argument, which ensures that the new data array has the same size as the original.

In [6]: np.convolve(a, [1./3]*3, 'same')
Out[6]: 
array([ 0.33333333,  1.        ,  2.        ,  3.        ,  4.        ,
        5.        ,  6.        ,  7.        ,  8.        ,  5.66666667])

As an alternate method, you could vectorize your code by using Numpy's cumsum function.

In [7]: (cs[3-1:] - np.append(0,cs[:-3]))/3.
Out[7]: array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.])

(This last one is a modification of the answer in a previous post.)

The trick might be that you should drop the first values of your date array. For example use the following plotting call, where n is the number of points in your average:

plt.plot_date(x=date[n-1:], y=m, fmt = '-', color='b')

score 0 · Answer 3 · answered Aug 20 '13 at 02:05

Okay, either I'm going nuts or it actually worked - I compared my chart vs. another chart and it seemed to have worked.

Does this make sense?

m = np.zeros(y.shape)
for i in range(10, y.shape[0]):
    m[i-10] = y[i-10:i].mean()
plt.plot_date(x=date, y=m, fmt = '-', color='r')

Plotting moving average with numpy and csv

3 Answers3

Alternative