Matplotlib, alternatives to savefig() to improve performance when saving into a CString object?

Question

I am trying to speed up the process of saving my charts to images. Right now I am creating a cString Object where I save the chart to by using savefig; but I would really, really appreciate any help to improve this method of saving the image. I have to do this operation dozens of times, and the savefig command is very very slow; there must be a better way of doing it. I read something about saving it as uncompressed raw image, but I have no clue of how to do it. I don't really care about agg if I can switch to another faster backend too.

ie:

RAM = cStringIO.StringIO()

CHART = plt.figure(.... 
**code for creating my chart**

CHART.savefig(RAM, format='png')

I have been using matplotlib with FigureCanvasAgg backend.

Thanks!

I don't really know much about this. But you can see if the following help: `format='raw'` or `format='rgba'`. It looks like they produce the same output. — Gary Kerr, Mar 22 '11 at 13:24
Have you tried profiling the code in order to see where savefig spends most of the time? Have you tried to reduce the resolution (dpi parameter) or other image types (jpeg,gif, tif, if supported)? — Bernhard, Mar 22 '11 at 15:14

score 40 · Answer 1 · edited Jun 20 '20 at 09:12

If you just want a raw buffer, try fig.canvas.print_rgb, fig.canvas.print_raw, etc (the difference between the two is that raw is rgba, whereas rgb is rgb. There's also print_png, print_ps, etc)

This will use fig.dpi instead of the default dpi value for savefig (100 dpi). Still, even comparing fig.canvas.print_raw(f) and fig.savefig(f, format='raw', dpi=fig.dpi) the print_canvas version is ~~marginally faster~~ insignificantly faster, since it doesn't bother resetting the color of the axis patch, etc, that savefig does by default.

Regardless, though, most of the time spent saving a figure in a raw format is just drawing the figure, which there's no way to get around.

At any rate, as a pointless-but-fun example, consider the following:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    fig.canvas.draw()

Brownian walk animation

If we look at the raw draw time:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    fig.canvas.draw()

This takes ~25 seconds on my machine.

If we instead dump a raw RGBA buffer to a cStringIO buffer, it's actually marginally faster at ~22 seconds (This is only true because I'm using an interactive backend! Otherwise it would be equivalent.):

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.canvas.print_raw(ram)
    ram.close()

If we compare this to using savefig, with a comparably set dpi:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.savefig(ram, format='raw', dpi=fig.dpi)
    ram.close()

This takes ~23.5 seconds. Basically, savefig just sets some default parameters and calls print_raw, in this case, so there's very little difference.

Now, if we compare a raw image format with a compressed image format (png), we see a much more significant difference:

import matplotlib.pyplot as plt
import numpy as np
import cStringIO

fig = plt.figure()
ax = fig.add_subplot(111)
num = 50
max_dim = 10
x = max_dim / 2 * np.ones(num)
s, c = 100 * np.random.random(num), np.random.random(num)
scat = ax.scatter(x,x,s,c)
ax.axis([0,max_dim,0,max_dim])
ax.set_autoscale_on(False)

for i in xrange(1000):
    xy = np.random.random(2*num).reshape(num,2) - 0.5
    offsets = scat.get_offsets() + 0.3 * xy
    offsets.clip(0, max_dim, offsets)
    scat.set_offsets(offsets)
    scat._sizes += 30 * (np.random.random(num) - 0.5)
    scat._sizes.clip(1, 300, scat._sizes)
    ram = cStringIO.StringIO()
    fig.canvas.print_png(ram)
    ram.close()

This takes ~52 seconds! Obviously, there's a lot of overhead in compressing an image.

At any rate, this is probably a needlessly complex example... I think I just wanted to avoid actual work...

Nice example Joe, even if it might be overkill. I'm wondering if you saved the frames drawn by each iteration on the disk and then compiled them offline into an animated gif, or is there someway of compiling the drawn frames "in-stream" into an animated gif? I don't mean using the $animation$ module, as I'd like to save animations produced by interactive (mouse-event driven) plots. — achennu, Apr 16 '13 at 07:32
Well, did some searching and I suppose your suggestion might be that shown here: http://stackoverflow.com/a/14986894/467522 , right? — achennu, Apr 16 '13 at 07:37
Actually, this particular gif was made by just saving each iteration and compiling them offline (with imagemagick's `convert`). (I think this example predates the release of a matplotlib version with the `animation` module.) At any rate, it should be possible to use `ffmpeg` to create an animated gif, but if I recall correctly, saving as a gif using the `animation` module doesn't work quite correctly. (I may be misremembering, and it may have been fixed by now, regardless. It's been awhile since I've tried.) — Joe Kington, Apr 17 '13 at 03:14
Realize this is an old thread but wondering if there's a way to avoid cStringIO. Any pure Matplotlib solution? — so860, Feb 12 '20 at 16:21

score 3 · Answer 2 · edited Oct 10 '12 at 20:00

3

I needed to quickly generate lots of plots as well. I found that multiprocessing improved the plotting speed with the number of cores available. For example, if 100 plots took 10 seconds in one process, it took ~3 seconds when the task was split across 4 cores.

edited Oct 10 '12 at 20:00

Nikana Reklawyks

3,233
3
33
49

answered Jul 03 '12 at 19:30

highvelcty

131
1
4

Matplotlib, alternatives to savefig() to improve performance when saving into a CString object?

2 Answers2

Linked