Problems with Smoothing graphs in Python

Question

I have been trying to smooth a plot which is noisy due to the sampling rate I'm using, and what it's counting. I've been using the help on here - mainly Plot smooth line with PyPlot (although I couldn't find the "spline" function and so am using UnivarinteSpline instead)

However, whatever I do I keep getting errors with either the pyplot error that "x and y are not of the same length" or, that the scipi.UnivariateSpline has a value for w that is incorrect. I am not sure quite how to fix this (not really a Python person!) I've attached the code although it's just the plotting bit at the end that is causing problems. Thanks

import os.path
import matplotlib.pyplot as plt
import scipy.interpolate as sci
import numpy as np
def main():
    jcc = "0050"
    dj = "005"
    l = "060"
    D = 20
    hT = 4 * D
    wT1 = 2 * D
    wT2 = 5 * D
    for jcm in ["025","030","035","040","045","050","055","060"]:
        characteristic = "LeadersOnly/Jcm" + jcm + "/Jcc" + jcc + "/dJ" + dj + "/lambda" + l + "/Seed000"
        fingertime1 = []
        fingertime2 = []
        stamp =[]
        finger=[]
        for x in range(0,2500,50):
            if x<10000:
                z=("00"+str(x))
            if x<1000:
                z=("000"+str(x))
            if x<100:
                z=("0000"+str(x))
            if x<10:
                z=("00000"+str(x))
            stamp.append(x)
            path = "LeadersOnly/Jcm" + jcm + "/Jcc" + jcc + "/dJ" + dj + "/lambda" + l + "/Seed000/profile_" + str(z) + ".txt"
            if os.path.exists(path):
                f = open(path, 'r')
                pr1,pr2=np.genfromtxt(path, delimiter='\t', unpack=True)
                p1=[]
                p2=[]
                h1=[]
                h2=[]
                a1=[]
                a2=[]
                finger1 = 0
                finger2 = 0
                for b in range(len(pr1)):
                    p1.append(pr1[b])
                    p2.append(pr2[b])
                for elem in range(len(pr1)-80):
                    h1.append((p1[elem + (2*D)]-0.5*(p1[elem]+p1[elem + (4*D)])))
                    h2.append((p2[elem + (2*D)]-0.5*(p2[elem]+p2[elem + (4*D)])))
                    if h1[elem] >= hT:
                        a1.append(1)
                    else:
                        a1.append(0)
                    if h2[elem]>=hT:        
                        a2.append(1)
                    else:
                        a2.append(0)
                for elem in range(len(a1)-1):
                    if (a1[elem] - a1[elem + 1]) != 0:
                        finger1 = finger1 + 1
                finger1 = finger1 / 2
                for elem in range(len(a2)-1):
                    if (a2[elem] - a2[elem + 1]) != 0:
                        finger2 = finger2 + 1
                finger2 = finger2 / 2
                fingertime1.append(finger1)
                fingertime2.append(finger2)
                finger.append((finger1+finger2)/2)
        namegraph = jcm
        stampnew = np.linspace(stamp[0],stamp[-1],300)
        fingernew = sci.UnivariateSpline(stamp, finger, stampnew)
        plt.plot(stampnew,fingernew,label=namegraph)
    plt.show()      

main()

For information, the data input files are simply a list of integers (two lists seperated by tabs, as the code suggests).

Here is one of the error codes that I get:

0-th dimension must be fixed to 50 but got 300

error                                     Traceback (most recent call last)

/group/data/Cara/JCMMOTFingers/fingercount_jcm_smooth.py in <module>()
    116
    117 if __name__ == '__main__':
--> 118     main()
    119
    120

/group/data/Cara/JCMMOTFingers/fingercount_jcm_smooth.py in main()
     93                 #print(len(stamp))
     94                 stampnew = np.linspace(stamp[0],stamp[-1],300)
---> 95                 fingernew = sci.UnivariateSpline(stamp, finger, stampnew)
     96                 #print(len(stampnew))
     97                 #print(len(fingernew))

/usr/lib/python2.6/dist-packages/scipy/interpolate/fitpack2.pyc in __init__(self, x, y, w, bbox, k, s)
     86         #_data == x,y,w,xb,xe,k,s,n,t,c,fp,fpint,nrdata,ier
     87         data = dfitpack.fpcurf0(x,y,k,w=w,
---> 88                                 xb=bbox[0],xe=bbox[1],s=s)
     89         if data[-1]==1:
     90             # nest too small, setting to maximum bound

error: failed in converting 1st keyword `w' of dfitpack.fpcurf0 to C/Fortran array

We can't reproduce without the LeadersOnly folder, though if you past an exact copy of the traceback you're getting we might be able to figure it out. Also, I made some adjustments to your indentation that the code would run. — David Robinson, Apr 22 '12 at 16:33
Not sure about the solution - but please change your 0-padding to something a bit nicer! The lines initializing z could be replaced by `z = '%06d' % x` — Steve Mayne, Apr 22 '12 at 16:37
have added in one of the error chains that I get.... I get other ones if I fix this by changing to stampnew = np.linspace(stamp[0],stamp[-1],50) but then get further errors, this time with regards to the pyplot function — Cara, Apr 22 '12 at 20:31

score 4 · Answer 1 · answered Sep 02 '12 at 15:12

Let's analyze your code a bit, starting from the for x in range(0, 2500, 50):

You define z as a string of 6 digits padded with 0s. You should really use somestring formatting like z = "{0:06d}".format(x) or z = "%06d" % x instead of these multiple tests of yours.
At the end of your loop, stamp will have (2500//50)=50 elements.
You check for the existence of your file path, then open it and read it, but you never close it. A more Pythonic way is to do:
```
try:
    with open(path,"r") as f:
        do...
except IOError:
    do something else
```
With the with syntax, your file is automatically closed.
pr1 and pr2 are likely to be 1D arrays, right? You can really simplify the construction of your p1 and p2 lists as:
```
p1 = pr1.tolist()
p2 = pr2.tolist()
```
Your lists a1, a2 have the same size: you could combine your for elem in range(len(a..)-1) loops in a single one. You could also use the np.diff function.
at the end of the for x in range(...) loops, finger will have 50 elements minus the number of missing files. As you're not telling what to do in case of a missing file, your stamp and finger lists may not have the same number of elements, which will crash your scipy.UnivariateSpline. An easy fix would be to update your stamp list only if the path file is defined (that way, it always has the same number of elements as finger).
Your stampnew array has 300 elements, when your stamp and finger can only have at most 50. That's a second problem, the size of the weight array (stampnew) must be the same as the size of the inputs.
You're eventually trying to plot fingernew vs stamp. The problem is that fingernew is not an array, it's an instance of UnivariateSpline. You still need to calculate some actual points, for example with fingernew(stamp), then use that in your plot function.

Problems with Smoothing graphs in Python

1 Answers1