2

I found several questions similar to mine, but the answers don't work in my case... I think, in my case it is something different and I need some help to figure out what it is.

My script reads two files with 50000 values for x and y and I plot these values with matplotlib plot function.

I inserted some print statements to figure out why it takes more then a minute or sometimes longer... So the last "DONE!" (before plt.show()) comes after lets say 2 seconds, but then everything stucks...

Sometimes I get a picture in a minute, and then sometimes, the same thing, takes 5 minutes or I kill the process.

Can anyone help? I am working on a Mac from 2012...

colors = cm.rainbow(np.linspace(0, 1, len(data_dict.keys())))

fig, ax = plt.subplots(dpi=150)
for key,c in zip(data_dict.keys(),colors):
    ax.plot(x,
            data_dict[key],
            label=key,
            color=c,
            alpha=.5)
    print("%s DONE!" % (key))

ax.axhline(1,color='red',linestyle='--')
ax.grid(True)
ax.set_xlabel("Zeit in ns")
ax.set_ylabel("Distanz in nm")
legend = ax.legend()
print("DONE!")
plt.show()
drmariod
  • 83
  • 5
  • What does `data_dict` look like? (You mentioned `y` in your question, but there is no `y` in your code?) How many times is that `for-loop` calling `ax.plot`? On a typical machine you won't get good performance if you are calling `ax.plot` thousands of times... – unutbu Mar 04 '14 at 14:32
  • In this case it get called 2 times... x = [2.0,4.0,6.0,...] ; data_dict = {'filename' : [2305.0,3456.0,...], 'filename' : [2305.0,3456.0,...]} – drmariod Mar 04 '14 at 14:36
  • Since you see the final 'Done' before the plt.show() and the delay appears to happen in there, try fig.savefig('output.jpg') to see how long it takes to actually create the image and see if it can be brought up in a different display method such as subprocess.Popen('firefox', '-new-tab', 'output.jpg') I had to do this because I needed to bring up the images in a nonblocking method to allow further processing to continue. – sabbahillel Mar 04 '14 at 14:47
  • You should show the questions and answers that do not work. Your link is pointing to this question and not the answer that does not work. – sabbahillel Mar 04 '14 at 14:51

1 Answers1

2

A typical monitor displays about 100 dpis. You are plotting 50K points. If each point were juxtaposed side-by-side then you would still need 50000/100.0 = 500 inches to display all the points individually. Usually a graph would have a bit of space between the points, which would make the number of required inches even greater.

To display the image on the screen, matplotlib is compressing the image into a window which may be 800 x 600 pixels. Thus 50K x-values are being displayed in only 800 pixels.

In other words, you are making matplotlib work hard plotting a lot of points which get short shrift in the final image.

Obviously, the images should summarize data in a humanly-understandable way. Since we probably can't wrap our minds around 50K distinguishable points, you probably should downsample your data. One crude way would be to take every 100th point:

x = x[::100] 

Or, you might take the average of every 100 points:

x = x.reshape(-1, 100).mean(axis=1)

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
N = 49999

def chunks(seq, n):
    # http://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in xrange(0, len(seq), n):
        yield seq[i:i + n]

def downsample(seq, n):
    return [sum(chunk)/len(chunk) for chunk in chunks(seq, n)]

x = range(N)
x = downsample(x, 100)
data_dict = {'A' : np.random.random(N), 'B' : np.random.random(N)}
colors = plt.cm.rainbow(np.linspace(0, 1, len(data_dict.keys())))

fig, ax = plt.subplots(dpi=150)
for key,c in zip(data_dict.keys(),colors):
    y = downsample(data_dict[key], 100)
    ax.plot(x,
            y,
            label=key,
            color=c,
            alpha=.5)
    print("%s DONE!" % (key))

ax.axhline(1,color='red',linestyle='--')
ax.grid(True)
ax.set_xlabel("Zeit in ns")
ax.set_ylabel("Distanz in nm")
legend = ax.legend()
print("DONE!")
# plt.show()
plt.savefig('/tmp/test.png')

enter image description here

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • `x = x[::100]`did the trick, thanks! I try to do the averaging like you recommended, but can save my list into an this np object, do you have a suggestion? My x and the ys are just lists with floats in it... thanks in advance! – drmariod Mar 04 '14 at 15:18
  • And by the way, you are right, it is not, that my monitor can resolve it, but the line gets bolder in areas with much variation... and I thought I need all the points... Didn't thought about, that maybe 5000 would be enough, but it still looks great... – drmariod Mar 04 '14 at 15:20
  • If `x` is a list, you can convert it to a NumPy array with `x = np.array(x)`. Then you can use `x = x.reshape(-1, 100).mean(axis=1)` to take the average of every 100 points. – unutbu Mar 04 '14 at 15:28
  • Hm, I get a ValueError on this `ValueError: total size of new array must be unchanged`. `print(type(x)); x = np.array(x); print(type(x)); x = x.reshape(-1, 100).mean(axis=1)`gives me first a type `list` and `numpy.ndarray`, but the execution stopes because of the ValueError. Wondering why your code is working... :-/ – drmariod Mar 04 '14 at 15:42
  • It sounds like `len(x)` is not evenly divisible by 100. I'll edit the code above to show how to take a downsampling average in this case. – unutbu Mar 04 '14 at 16:11
  • That was the problem... Didn't thought about the exact length... Had 50001 values... – drmariod Mar 04 '14 at 16:22