2

A simple example:

from matplotlib.pyplot import plot, savefig
from numpy.random import randn

plot(randn(100),randn(100,500),"k",alpha=0.03,rasterized=True)
savefig("test.pdf",dpi=90)

Produces:

enter image description here

But the file size comes out to be ~8Mb. Any ideas what's going wrong? Could this be a bug? I'm on Python 3.5.1 and Matplotlib 2.1.2.

AKX
  • 152,115
  • 15
  • 115
  • 172
marius
  • 1,352
  • 1
  • 13
  • 29

2 Answers2

6

Looks like the full answer is in the comment to here: https://stackoverflow.com/a/12102852/1078529

The trick is to use set_rasterization_zorder to rasterize everything below a certain zorder together into a single bitmap,

gca().set_rasterization_zorder(1)
plot(randn(100),randn(100,500),"k",alpha=0.03,zorder=0)
savefig("test.pdf",dpi=90)
marius
  • 1,352
  • 1
  • 13
  • 29
3

With rasterized=True, you get a PDF with an embedded bitmap (which can be big). With rasterized=False, you get a PDF with tons of embedded line-drawing instructions (which aren't big, but can take a while to render).

With rasterized=False, I get a 374 KiB document.

EDIT: Digging a little deeper, in the rasterized=True document (which clocks in at about 7 megabytes), it looks like every line gets its own bitmap, and they are overlaid:

$ pdfimages -list -all test.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     408   177  rgb     3   8  image  no        12  0    90    90 4192B 1.9%
   1     1 smask     408   177  gray    1   8  image  no        12  0    90    90 7511B  10%
   1     2 image     408   170  rgb     3   8  image  no        13  0    90    90 4472B 2.1%
   1     3 smask     408   170  gray    1   8  image  no        13  0    90    90 7942B  11%
   1     4 image     408   180  rgb     3   8  image  no        14  0    90    90 5454B 2.5%
   1     5 smask     408   180  gray    1   8  image  no        14  0    90    90 9559B  13%
   1     6 image     408   180  rgb     3   8  image  no        15  0    90    90 4554B 2.1%
   1     7 smask     408   180  gray    1   8  image  no        15  0    90    90 8077B  11%
[... 993 more images ...]

For the nonrasterized document, there are no images at all.

AKX
  • 152,115
  • 15
  • 115
  • 172
  • "it looks like every line gets its own bitmap" Ah ok, that must be it... I didn't know it did it in this (inefficient) way. – marius Feb 20 '18 at 11:48