1

I'm currently doing a benchmark project, in which I run different tests and store their runtime in files. I then take those outputs and create a graph, which then compares the test run with its own earlier results. Some of the code is quite complex, and it has been puzzling me for quite a while, as to why I sometimes get more plots than I should. I will try to provide all relevant information.

Each test has multiple datasets, and are stored in .txt like this:

name: runtime
Example: dataset #1: 8198

I find all the older tests using glob.glob, and it is working just fine. It finds the older results from only the same test. I know that the issue is not with finding the results. Also the results are just fine, sometimes there is an invalid result, but I filter them out just fine, so only the valid results are being used.

I take the path from the result that I have just gotten, and create a graph, by finding all the older results. I create an empty list (Called x) from 0 to n, that I use for making xticks, as I want custom ticks on the x-axis (Commit names).

# Get all older and current results from a given test
outputs = getValidResults(path)
# Stores the runs in a list like:
# [[dataset1 run1, dataset1 run2],[datset2 run1, dataset2 run2]]
runs = getAllRuns(outputs)
# Names used for xticks
commits = getAllCommits(path)
# Name of the test
testName = getTestName(path)

x = []
for m in range(0, len(outputs)):
    x.append(m)

for n in range(0, dataset_amount):
    y = []
    for run in runs:
        y.append(run[n])
    plt.xticks(x, commits, rotation=70)
    logger.debug('x is: {} and y is: {}'.format(x,y))
    plt.plot(x, y, 'o-')
plt.ylabel('Build/run time in microseconds')
plt.xlabel('Commit')
lgd = plt.legend(datasets, bbox_to_anchor=(1, 0.5), loc='center left', fancybox=True)
plt.title(testName)
plt.grid(True)
plt.tight_layout()
plt.savefig(savePath, bbox_extra_artists=(lgd,), bbox_inches='tight')

Now you will notice, that I log whenever an x and y value is being plotted. And when I look in my log I get, that it only tried to add a plot once, and even then both were empty:

[2016-06-02 11:29:16,684] - {DEBUG:htmlgen.py:197} - x is: [] and y is: []

However it still saves the graph, and the graph that comes out is:

Graph with wrong plot

The results are all wrong, first of as you can see in the legend this test has only one dataset. It could not possibly have more than one. Another interesting thing to note, is that the results on the graph is from another test that was being created just before this one. This leads me to believe that somehow it reuses the same results, could it be a memory problem?

I have tried to log every variable, and they all come out empty, expect for the dataset which is correct:

[2016-06-02 11:29:16,650] - {DEBUG:htmlgen.py:167} - Average is: []
[2016-06-02 11:29:16,650] - {DEBUG:htmlgen.py:168} - Commits found: [].
[2016-06-02 11:29:16,650] - {DEBUG:htmlgen.py:169} - Datasets found: ['fluid-n_steps=1-n_solver_steps=40-grid_res=100.input:'].
[2016-06-02 11:29:16,650] - {DEBUG:htmlgen.py:170} - Dataset length: 1
[2016-06-02 11:29:16,650] - {DEBUG:htmlgen.py:171} - Outputs found: []
[2016-06-02 11:29:16,683] - {DEBUG:htmlgen.py:180} - Trying to create plot points...
[2016-06-02 11:29:16,684] - {DEBUG:htmlgen.py:197} - x is: [] and y is: []
[2016-06-02 11:29:16,686] - {DEBUG:htmlgen.py:199} - Successfully created plot points.

It does not find any results at all, but somehow still makes a graph, even though it does not actually plot anything in the code.

Interesting also to note, is that the first graph I make is always fine. However from the second one and on the graphs reuses the results from the graph created first.

Has anybody experienced or seen anything like this? Where matplotlib reuses datapoints from graphs created earlier, and not from the actual graph being made?

cenh
  • 154
  • 13

1 Answers1

1

You should clear the plots with plt.clf() after you have saved them or else you will write to the same figure.

See this answer for more info.

Community
  • 1
  • 1