3

I have the following demonstration code where I create a simple scatter plot and save it as png, fully vectorized eps and partly rasterized eps.

For a large number of points I expect the filesize of the vectorized eps to be much bigger than the png (at least at reasonable dpi), and this is indeed what I observe. When I rasterize the scatter plot, I would expect the filesize to get back down towards the size of the png, since I'm practically just "embedding" the png in an eps, right? However, the rasterized version completely bloats up by a factor of ~20:

png: 48K, fully vectorized eps: 184K, rasterized eps: 3.8M (on Linux openSUSE, python 3.4.6, matplotlib 2.2.2)

What's the reason for this? Is my understanding of what happens when one rasterizes the plot completely wrong? When I put the png into inkscape and export as eps I get a file (which is obviously rasterized) of only minutely larger size than the original png.

Demonstration code:

import matplotlib.pyplot as plt
import numpy as np

# Prepare some random data
N = 10000
x = np.random.rand(N)
y = np.random.rand(N)

dpi = 150

# Create a figure and plot some points
fig = plt.figure()
ax = fig_mesh.add_subplot(111)

scatter = ax.scatter(x, y, zorder=0.5)

# Save it as png or unrasterized eps
fig_mesh.savefig('mesh.png', dpi=dpi) # 184K
fig_mesh.savefig('mesh.eps') # 48 K

# Save it with rasterized points
ax_mesh.set_rasterization_zorder(1)
fig_mesh.savefig('mesh_rasterized.eps', dpi=dpi, rasterized=True) # 3.8M!

Thanks in advance!

kuadrat
  • 33
  • 1
  • 7
  • I could imagine that the rasterized image stored in an eps file is not a compressed png but a bitmap, because eps in general needs to be lossless. A bitmap is in general very large compared to png. So when saving some random image as png I got a filesize of 100 kB, while saving the same image as bmp it's 2 MB. – ImportanceOfBeingErnest Jul 31 '18 at 13:13
  • @ImportanceOfBeingErnest Indeed, upon inspection of the `eps` I find that the bulk of the file is made up by a bmp. Turns out my understanding of rasterisation was wrong: while I thought the rasterized parts would be turned into png-like images of fixed resolution, they are still with infinite resolution. It's just that without rasterization we get lots of vector instructions to draw rectangles (which may take long to render, depending on the viewer) and with rasterization we get a huge bitmap that probably is faster to render instead. Would it be good etiquette to answer my own question now? – kuadrat Jul 31 '18 at 16:41
  • I guess the polite way would be to ask the commenter first if they would like to answer, but in this case you can safely assume that I would have done so if I really wanted, so yes please go ahead and answer and then don't forget to accept that answer in 2 days time. – ImportanceOfBeingErnest Jul 31 '18 at 16:48
  • Thanks for the hint. I actually wanted to pose my last question more politely, including an intrinsic suggestion for you to answer but I was running out of character space, so I thought "f*** it" and shortened down ;) – kuadrat Jul 31 '18 at 17:01

1 Answers1

0

I'll answer my own question here, but thanks to @ImportanceOfBeingErnest for pointing me the right way. The short answer to the question is: I had the wrong understanding of what the rasterized keyword in matplotlib (and rasterization in general) actually do.

The explanation for the filesize increase is simply the fact that whatever is being rasterized has to be put into the resulting eps as a compressionless bitmap. Depending on the requested dpi, this may take up less space than the set of vector instructions we have in the unrasterized case, or a lot more. One can test this by changing the dpi value in the questions' demonstration code to a different value. At dpi = 10, for example, the rasterized image is clearly smaller - though in this case the resolution of the plotted points is unbearably low. However, in the case of a rectangular grid e.g. as produced by pcolormesh a low dpi can actually be set without losing "resolution" of the pcolormesh data.

For completeness I add an example with pcolormesh where with a low dpi setting the resulting rasterized eps is smaller than the vectorial version:

import matplotlib.pyplot as plt
import numpy as np

# Prepare some random data
n = 100
N = n*n
data = np.random.rand(N).reshape(n,n)

dpi = 50

# Create a figure and plot some points
fig = plt.figure()
ax = fig.add_subplot(111)

mesh = ax.pcolormesh(data, zorder=0.5)

# Save it as png or unrasterized eps
fig.savefig('mesh.png', dpi=dpi)
fig.savefig('mesh.eps')

# Save it with rasterized points
ax.set_rasterization_zorder(1)
fig.savefig('mesh_rasterized.eps', dpi=dpi, rasterized=True)

Furthermore, during my research I found a simple "hack" to easily reduce the filesize of an eps (seemingly without losses) using the epstopdf and pdftops commands (tested on linux) that I hope will be useful to some:

$ epstopdf my.eps #Creates file my.pdf
$ pdftops -eps my.pdf # Creates smaller my.eps (overwriting the old one!)

And finally, some related questions that helped me reach an understanding:

kuadrat
  • 33
  • 1
  • 7