0

Consider the following:

  1. Plot a histogram using R and save it in PDF:

     set.seed(42)
     x = c(rnorm(1000, 1, 1), rnorm(1000, 8, 3))
     pdf("Rplot.pdf", width = 10, height = 3.33)
     par(mar = c(4, 5, 0, 0), family = "serif")
     hist(x, breaks = 100, border = NA, col = "gray", 
          xlab = "x", ylab = "Frequency", cex.lab = 2.75, cex.axis = 2,
          main = "", las = 1, xaxt = "n")
     axis(side = 1, at = seq(-2.5, by = 2.5, len = 30), cex.axis = 2)
     dev.off()
    
  2. Plot a histogram using Python and save it in PDF:

    import numpy as np
    import matplotlib.pyplot as plt
    np.random.seed(42)
    x = np.concatenate((np.random.normal(1, 1, size = 1000),
       np.random.normal(8, 3, size = 1000)))
    plt.close()
    plt.rcParams["figure.figsize"] = (10, 3.33)
    plt.rcParams["font.family"] = "Times New Roman"
    plt.rcParams["axes.spines.bottom"] = True
    plt.rcParams["axes.spines.left"] = True
    plt.rcParams["axes.spines.top"] = False
    plt.rcParams["axes.spines.right"] = False
    tmp = plt.hist(x, bins = 100, color = 'lightgray')
    plt.xlabel('x', fontsize = 30)
    plt.ylabel('Frequency', fontsize = 30)
    tmp = plt.xticks(fontsize = 25)
    tmp = plt.yticks(fontsize = 25)
    plt.tight_layout()
    plt.savefig("pyPlot.pdf", bbox_inches='tight')
    

Not only pyPlot.pdf (13KB) is 2.6x the size of Rplot.pdf (5KB), but if we compare them in Adobe Reader, pyPlot.pdf is also obviously blurrier than Rplot.pdf.

Some further investigation shows that, if we save both plots in .svg, then they are totally comparable. pyPlot.pdf also appears to be a direct clone of pyPlot.svg in terms of visual quality.

Is it possible to generate the level of visual quality and file size of Rplot.pdf using Matplotlib?

PS: I uploaded the two .pdfs here: https://github.com/WhateverLiu/twoImages . Please check the file size and visual quality. Even in Chrome, if you look closely, Rplot.pdf prints smoother labels. But the major problem is that pyPlot.pdf is 2.5x larger, which really frustrates my work. Is it simply because R performed extra optimization on its graphic device? I don't want to give up on Python yet..

user2961927
  • 1,290
  • 1
  • 14
  • 22
  • Can you post the two images? When I run your Python code I get a pdf with a vectorized figure (I can zoom in without any blurriness) – Brener Ramos Apr 19 '23 at 18:38
  • @BrenerRamos the two pdfs have been uploaded to https://github.com/WhateverLiu/twoImages – user2961927 Apr 19 '23 at 18:51
  • When I open both pdfs on Adobe Reader on my machine, both figures are vectorized. In other words, there is no bluerriness. I am not sure if I am missing something. – Brener Ramos Apr 19 '23 at 18:58
  • @BrenerRamos Yes they are both vectorized, but fonts in Rplot.pdf is sharper than those in pyPlot.pdf, and Rplot.pdf is only 5KB comparing to pyPlot.pdf's 12KB. I am making large LaTeX documents of numerous of such plots, and Python gives me figures of both lower visual quality and more than twice the size.. – user2961927 Apr 19 '23 at 19:03
  • On my viewer both pdfs look sharp. – Jody Klymak Apr 20 '23 at 00:03
  • @JodyKlymak Both look sharp but which is sharper? The major problem is that Matplotlib produces 2.5x file size, which is killing me.. – user2961927 Apr 20 '23 at 02:20
  • 1
    The Matplotlib fonts are type 3 instead of TrueType, which perhaps are blurrier on your viewer. You can force TrueType in Matplotlib `plt.rcParams["pdf.fonttype"] = 42`, but that makes an even larger file because Matplotlib is embedding the font. I assume that R is not embedding the font, which is OK for common fonts, dangerously unportable for less common fonts. This font overhead is relatively small, and of fixed size -eg more complicated plots will just have the overhead. If you want to optimize, Ghostscript can strip the embedding. – Jody Klymak Apr 20 '23 at 15:34
  • 1
    https://stackoverflow.com/questions/60076026/reducing-file-sizes-of-pdfs-created-using-matplotlib-by-changing-font-embedding – Jody Klymak Apr 20 '23 at 15:34
  • @JodyKlymak Please post your comment as an answer if you'd like and I'll accept it – user2961927 Apr 22 '23 at 03:59

0 Answers0