3

Matlab's smooth function, by default, smooths data using a 5-point moving average. What would be the best way to do the same in python? For example, if this is my data

0
0.823529411764706
0.852941176470588
0.705882352941177
0.705882352941177
0.676470588235294
0.676470588235294
0.500000000000000
0.558823529411765
0.647058823529412
0.705882352941177
0.705882352941177
0.617647058823529
0.705882352941177
0.735294117647059
0.735294117647059
0.588235294117647
0.588235294117647
1
0.647058823529412
0.705882352941177
0.764705882352941
0.823529411764706
0.647058823529412
0.735294117647059
0.794117647058824
0.794117647058824
0.705882352941177
0.676470588235294
0.794117647058824
0.852941176470588
0.735294117647059
0.647058823529412
0.647058823529412
0.676470588235294
0.676470588235294
0.529411764705882
0.676470588235294
0.794117647058824
0.882352941176471
0.735294117647059
0.852941176470588
0.823529411764706
0.764705882352941
0.558823529411765
0.588235294117647
0.617647058823529
0.647058823529412
0.588235294117647
0.617647058823529
0.647058823529412
0.794117647058824
0.823529411764706
0.647058823529412
0.617647058823529
0.647058823529412
0.676470588235294
0.764705882352941
0.676470588235294
0.647058823529412
0.705882352941177
0.764705882352941
0.705882352941177
0.500000000000000
0.529411764705882
0.529411764705882
0.647058823529412
0.676470588235294
0.588235294117647
0.735294117647059
0.794117647058824
0.852941176470588
0.764705882352941

the smoothed data should be

0
0.558823529411765
0.617647058823530
0.752941176470588
0.723529411764706
0.652941176470588
0.623529411764706
0.611764705882353
0.617647058823530
0.623529411764706
0.647058823529412
0.676470588235294
0.694117647058824
0.700000000000000
0.676470588235294
0.670588235294118
0.729411764705882
0.711764705882353
0.705882352941177
0.741176470588235
0.788235294117647
0.717647058823529
0.735294117647059
0.752941176470588
0.758823529411765
0.735294117647059
0.741176470588235
0.752941176470588
0.764705882352941
0.752941176470588
0.741176470588235
0.735294117647059
0.711764705882353
0.676470588235294
0.635294117647059
0.641176470588236
0.670588235294118
0.711764705882353
0.723529411764706
0.788235294117647
0.817647058823530
0.811764705882353
0.747058823529412
0.717647058823530
0.670588235294118
0.635294117647059
0.600000000000000
0.611764705882353
0.623529411764706
0.658823529411765
0.694117647058824
0.705882352941176
0.705882352941176
0.705882352941176
0.682352941176471
0.670588235294118
0.676470588235294
0.682352941176471
0.694117647058824
0.711764705882353
0.700000000000000
0.664705882352941
0.641176470588236
0.605882352941177
0.582352941176471
0.576470588235294
0.594117647058824
0.635294117647059
0.688235294117647
0.729411764705882
0.747058823529412
0.803921568627451
0.764705882352941

The syntax in Matlab to get this is

smooth(data)

I want to do the same in python but I am unable to find any function that would do this.

Divakar
  • 218,885
  • 19
  • 262
  • 358
rsnaveen
  • 323
  • 1
  • 4
  • 15

1 Answers1

15

MATLAB's smoooth func is basically same as averaging across sliding windows of length 5, except the way it treats the 2 elems at either ends. As per the linked docs, those boundary cases are computed with these formulae -

yy = smooth(y) smooths the data in the column vector y ..
The first few elements of yy are given by

yy(1) = y(1)
yy(2) = (y(1) + y(2) + y(3))/3
yy(3) = (y(1) + y(2) + y(3) + y(4) + y(5))/5
yy(4) = (y(2) + y(3) + y(4) + y(5) + y(6))/5
...

So, to replicate the same implementation on NumPy/Python, we can use NumPy's 1D convolution for getting sliding windowed summations and divide them by the window length to give us the average results. Then, simply append the special case treated values for the boundary elems.

Thus, we would have an implementation to handle generic window sizes, like so -

def smooth(a,WSZ):
    # a: NumPy 1-D array containing the data to be smoothed
    # WSZ: smoothing window size needs, which must be odd number,
    # as in the original MATLAB implementation
    out0 = np.convolve(a,np.ones(WSZ,dtype=int),'valid')/WSZ    
    r = np.arange(1,WSZ-1,2)
    start = np.cumsum(a[:WSZ-1])[::2]/r
    stop = (np.cumsum(a[:-WSZ:-1])[::2]/r)[::-1]
    return np.concatenate((  start , out0, stop  ))
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thanks a lot @Divakar. But I get the following error when I try to use the function 'smooth': 'AttributeError: 'list' object has no attribute 'cumsum'' – rsnaveen Nov 06 '16 at 00:24
  • 1
    @rsnaveen I was assuming `a` to be a NumPy array. Fixed it to handle both arrays and lists. – Divakar Nov 06 '16 at 07:27
  • Thanks a lot @Divakar. – rsnaveen Nov 07 '16 at 23:08
  • If I copy and past your solution and run it over a numpy array I get start = np.cumsum(a[:WSZ-1])[::2]/r ValueError: operands could not be broadcast together with shapes (150,) (149,) – 00__00__00 Dec 18 '17 at 07:07
  • My array shape was (1436L,), while WSZ=300 – 00__00__00 Dec 18 '17 at 07:07
  • 1
    @ErroriSalvo It needs an odd window size, as documented in [MATLAB's smooth docs](https://www.mathworks.com/help/curvefit/smooth.html). It says there - `"yy = smooth(y,span) sets the span of the moving average to span. span must be odd."`. So, use an odd number for WSZ there. – Divakar Dec 18 '17 at 08:55
  • ok, I have understood the reason, maybe it is worth mentioning in the answer – 00__00__00 Dec 18 '17 at 08:57