3

I'm creating plots with matplotlib.pyplot and writing them to pdf. Some of these plots have largeish (up to 100,000) points and obviously have a lot of overlapping points, i.e. certain parts of the chart are just a solid mass. (That's okay - I'm interested in what the sparser parts of the graph look like.)

When I save these plots to pdf, it takes a long time to write, and reading the pdf is even worse. Is there a way to store a "lossy" copy of the plot in the pdf? For example, if I took a screenshot of the plot and embedded it in the pdf, it would load a lot faster.

unsorted
  • 3,114
  • 3
  • 29
  • 39

1 Answers1

3

I recommend trying to plot with the option rasterized:

pts = np.random.rand(2, 100000)
plt.scatter(*pts, rasterized=True)
plt.savefig('rast.pdf')

For comparison:

plt.scatter(*pts)
plt.savefig('reg.pdf')

And

$ ls -lh tmp*.pdf
177K Dec  9 22:03 tmp_rast.pdf
1.5M Dec  9 22:02 tmp_reg.pdf
askewchan
  • 45,161
  • 17
  • 118
  • 134