11

i am plotting some data in R using the following commands:

jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()

The result looks like this:

enter image description here

The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.

alex
  • 833
  • 4
  • 12
  • 21
  • Well you did just plot some unknown (large?) number of line segments. In general, pdf is the worst possible way to encode something. Have you tried generating eps or svg with `cairo()` ? – Carl Witthoft Dec 15 '11 at 14:24
  • It doesn't look particularly humungous. the first plot looks a bit detailed. What does summary(jj) say? Is that going up to 4e+05? – Spacedman Dec 15 '11 at 16:55
  • For variables with lots of repeated values, one might be able to put together a solution with `rle` that would drop repeated values and save the time coordinates of the changepoints ... but that would be quite a bit more complex, and not save anything for continuously varying variables – Ben Bolker Dec 15 '11 at 19:01

5 Answers5

11

Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.

If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:

  • printer (300dpi)
  • ebook (150dpi)
  • screen (72dpi)

The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command

tools::compactPDF('mypdfs/', gs_quality='ebook')

Mohamad Elmasri
  • 461
  • 2
  • 5
  • 12
hadley
  • 102,019
  • 32
  • 183
  • 245
  • Can you please give an example about this? I cannot do `tools::compactPDF(filename, qpdf = Sys.getenv("R_QPDF", "qpdf"), gs_cmd = Sys.getenv("R_GSCMD", ""), gs_quality = "screen", gs_extras = character(), ratio = 0.2)` because it fails. - - All necessary tools installed in Debian 8.5. Do you need to import something extra? - - I feel this answer now stub. – Léo Léopold Hertz 준영 Nov 15 '16 at 20:42
7

You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.

For information from the other direction, too large a raster image, see this answer.

Community
  • 1
  • 1
John
  • 23,360
  • 7
  • 57
  • 83
4

One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:

# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]

I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.

djq
  • 14,810
  • 45
  • 122
  • 157
4

Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.

Yihui Xie
  • 28,913
  • 23
  • 193
  • 419
1

Hard to tell without seeing what the plot looks like - post a screenshot?

I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Spacedman
  • 92,590
  • 12
  • 140
  • 224