Consider the following:
Plot a histogram using R and save it in PDF:
set.seed(42) x = c(rnorm(1000, 1, 1), rnorm(1000, 8, 3)) pdf("Rplot.pdf", width = 10, height = 3.33) par(mar = c(4, 5, 0, 0), family = "serif") hist(x, breaks = 100, border = NA, col = "gray", xlab = "x", ylab = "Frequency", cex.lab = 2.75, cex.axis = 2, main = "", las = 1, xaxt = "n") axis(side = 1, at = seq(-2.5, by = 2.5, len = 30), cex.axis = 2) dev.off()
Plot a histogram using Python and save it in PDF:
import numpy as np import matplotlib.pyplot as plt np.random.seed(42) x = np.concatenate((np.random.normal(1, 1, size = 1000), np.random.normal(8, 3, size = 1000))) plt.close() plt.rcParams["figure.figsize"] = (10, 3.33) plt.rcParams["font.family"] = "Times New Roman" plt.rcParams["axes.spines.bottom"] = True plt.rcParams["axes.spines.left"] = True plt.rcParams["axes.spines.top"] = False plt.rcParams["axes.spines.right"] = False tmp = plt.hist(x, bins = 100, color = 'lightgray') plt.xlabel('x', fontsize = 30) plt.ylabel('Frequency', fontsize = 30) tmp = plt.xticks(fontsize = 25) tmp = plt.yticks(fontsize = 25) plt.tight_layout() plt.savefig("pyPlot.pdf", bbox_inches='tight')
Not only pyPlot.pdf
(13KB) is 2.6x the size of Rplot.pdf
(5KB), but if we compare them in Adobe Reader, pyPlot.pdf
is also obviously blurrier than Rplot.pdf
.
Some further investigation shows that, if we save both plots in .svg
, then they are totally comparable. pyPlot.pdf
also appears to be a direct clone of pyPlot.svg
in terms of visual quality.
Is it possible to generate the level of visual quality and file size of Rplot.pdf
using Matplotlib?
PS: I uploaded the two .pdf
s here: https://github.com/WhateverLiu/twoImages . Please check the file size and visual quality. Even in Chrome, if you look closely, Rplot.pdf
prints smoother labels. But the major problem is that pyPlot.pdf
is 2.5x larger, which really frustrates my work. Is it simply because R performed extra optimization on its graphic device? I don't want to give up on Python yet..