Traceback lines on plot of multiple files

Question

I am plotting data from multiple files. I do not want to use the glob module since I need to plot the data from each file separately. The data is plotting, but there are 'traceback' lines on the plot when they are graphed using Matplotlib. The image of the plots is below:

Here are some sample data to help solve the problem and im sorry about the lack of formatting. The data is from unformatted text files. If you split the two data sets into two separate files it should recreate the issue.

Start-Mi, End-Mi,   IRI LWP, IRI R e
194.449,    194.549,    75.1,   92.3
194.549,    194.649,    85.2,   82.8
194.649,    194.749,    90.8,   91.8
194.749,    194.849,    79.3,   73.7
194.849,    194.949,    76.9,   80.1
194.949,    195.049,    82.7,   86.9
195.049,    195.149,    103,    116.7
195.149,    195.249,    81.5,   96.1
195.249,    195.349,    96.7,   92.7
195.349,    195.449,        59.5,   72.2

and

Start-Mi, End-Mi,   IRI LWP, IRI R e
194.449,    194.549,    79.9,   95.7
194.549,    194.649,    87.4,   96.5
194.649,    194.749,    86.5,   105.3
194.749,    194.849,    77, 76
194.849,    194.949,    73.6,   85.2
194.949,    195.049,    81.7,   94.3
195.049,    195.149,    104.6,  128.2
195.149,    195.249,    84.2,   98.6
195.249,    195.349,    94.2,   91.3
195.349,    195.449,    57.5,   72.1

The traceback lines are created when the code begins a new data plot on a new file. Im trying to get rid of the horizontal lines drawn from the end of the plot back to the beginning. I need clean up the plot since the code is designed to iterate over a indefinite number of data files. The code is shown below:

def graphWriterIRIandRut():
    n = 100
    m = 0
    startList = []
    endList = []
    iriRList = []
    iriLList = []
    fileList = []
    for file in os.listdir(os.getcwd()):
        fileList.append(file)
    while m < len(fileList):
        for col in csv.DictReader(open(fileList[m],'rU')):
            startList.append(float(col['Start-Mi']))
            endList.append(float(col['  End-Mi']))
            iriRList.append(float(col[' IRI R e']))
            iriLList.append(float(col['IRI LWP ']))

        plt.subplot(2, 1, 1)
        plt.grid(True)
        colors = np.random.rand(n)
        plt.ylabel('IRI value',fontsize=12)
        plt.title('Right IRI data per mile for 2016 calibrations: ')
        plt.plot(startList,iriRList,c=colors)
        plt.tick_params(axis='both', which='major', labelsize=8)

        plt.subplot(2, 1, 2)
        plt.grid(True)
        colors = np.random.rand(n)
        plt.ylabel('IRI value',fontsize=12)
        plt.title('Left IRI data per mile for 2016 calibrations: ')
        plt.plot(startList,iriLList,c=colors)
        plt.tick_params(axis='both', which='major', labelsize=8)

        m = m + 1
        continue

    plt.show()
    plt.gcf().clear()
    plt.close('all')

While you are clearly a beginner at Python, this is a very well phrased question that shows a good amount of thought on your part. Kudos. I felt the need to point that out because it is not often that I see this happen. — Mad Physicist, Sep 08 '16 at 14:27
Do you need to keep the data around for something later, or just plot it? — Mad Physicist, Sep 08 '16 at 14:29
It matters because the solution is somewhat simpler if you do not need to keep the data around. — Mad Physicist, Sep 08 '16 at 14:32
No I don't need to keep the data around in that folder per se. But it would be nice to be able to add files to the folder over time. — , Sep 08 '16 at 14:37
Yes im not a programmer just an analyst using python to script as needed — , Sep 08 '16 at 14:38
Kinda. There are a bunch of constructs you use/don't use in a way that shows that you lack familiarity with the language. Your logic is perfectly sound though, and that's the only thing that really matters. I'll address the tangential stuff in my answer too. — Mad Physicist, Sep 08 '16 at 14:45
By the way I do not need to keep the data around in memory. It just needs to be plotted. — , Sep 08 '16 at 15:08

score 0 · Accepted Answer · answered Sep 08 '16 at 15:08

Your code is currently doing the following:

Read data from a file, appending it to a list
Plotting the list

The list is not cleared at any point, so you keep plotting the list with more and more data appended to it, most of which is being plotted over and over. This is also why all your lines have the same color: it is the color of the last plot you made, which exactly covers all the previous plots and adds one more line.

As it happens, pyplot has a nifty hold function that lets you ensure that any additional plots you make on a figure won't overwrite the old ones. You don't even need to generate your own color sequence. pyplot will do that for you too.

While your program is functionally sound, there are also a few "stylistic" issues in your code that can be easily corrected. They are un-Pythonic at best and actually problematic at worst:

Files should be closed after being opened. A context manager used in the with keyword is the standard approach for this.
There are better ways to copy the result of os.listdir than a for loop. In fact, you don't need to copy the list at all.
If you are writing a while loop that increments an index on every iteration, it should be a for loop.
You never need a continue at the end of a loop. It is implied.

So here is a solution that combines all of the above. This version assumes that you do not need to keep the contents of a given file around after you plot it:

def graphWriterIRIandRut():
    # Set up the plots
    plt.subplot(2, 1, 1)
    plt.grid(True)
    plt.ylabel('IRI value', fontsize=12)
    plt.title('Right IRI data per mile for 2016 calibrations:')
    plt.tick_params(axis='both', which='major', labelsize=8)
    plt.hold(True)

    plt.subplot(2, 1, 2)
    plt.grid(True)
    plt.ylabel('IRI value', fontsize=12)
    plt.title('Left IRI data per mile for 2016 calibrations:')
    plt.tick_params(axis='both', which='major', labelsize=8)
    plt.hold(True)

    # Iterate over the files in the current directory
    for filename in os.listdir(os.getcwd()):
        # Initialize a new set of lists for each file
        startList = []
        endList = []
        iriRList = []
        iriLList = []

        # Load the file
        with open(filename, 'r') as file:
            for row in csv.DictReader(file):
                startList.append(float(row['Start-Mi']))
                endList.append(float(row[' End-Mi']))
                iriRList.append(float(row[' IRI R e']))
                iriLList.append(float(row['   IRI LWP']))

        # Add new data to the plots
        plt.subplot(2, 1, 1)
        plt.plot(startList, iriRList)
        plt.subplot(2, 1, 2)
        plt.plot(startList, iriLList)

    plt.show()
    plt.close('all')

Running this function on the inputs you provided yields the following figure:

For a more efficient way to work with CSVs and tabular data in general, you may want to check out the pandas library. It is a really powerful tool for analysis which includes plotting and IO routines for most of the use-cases you can probably imagine.

You may want to change the column headings in my answer. They may or may not be accurate for your case, but I had to change a couple of them to get it to run with your inputs. — Mad Physicist, Sep 08 '16 at 15:12
Im getting a strange error message: Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? — , Sep 08 '16 at 15:21
I changed with open(filename, 'r') as file: To: with open(filename, 'rU') as file: and that did the trick. — , Sep 08 '16 at 15:27
Otherwise thank you for your help and information. Im always willing to learn more and become better at this skill. — , Sep 08 '16 at 15:28
Yeah, sounds like you need the `U`. Are you running on Windows perhaps? — Mad Physicist, Sep 08 '16 at 16:25

Traceback lines on plot of multiple files

1 Answers1

Linked