4

I'm using ghostscript on a linux server to combine PDFs. I'm using the latest version, 9.05. I typically have to combine 5 or more PDFs. Currently it takes around 20 seconds to combine 3 PDFs...that just seems really slow to me. Here's the line I'm using:

gs -dBATCH -dNOPAUSE -dNOGC -q -sDEVICE=pdfwrite -sOutputFile=output.pdf -c 3000000 setvmthreshold -f a.pdf b.pdf c.pdf

Any suggestions?

  • How big are the 3 PDFs in terms of bytes and pages? How many fonts are used by each PDF? (Use `pdffonts a.pdf`, `pdffonts b.pdf`...) What are the respective facts for `output.pdf`? Do you know which effect `-dNOGC` is supposed to have? Does it work? – Kurt Pfeifle Apr 26 '12 at 20:42
  • @pipitas Good point. Turning off the GC could cause more swapping. Also, needs quotes around the argument to -c option. There is no -f option. – luser droog Apr 29 '12 at 00:14

4 Answers4

2

Ghostscript fully interprets each PDF file down to marking operations and then builds a new file from the combined content. This is, obviously, far slower than simply copying the content stream around. That's why what you are doing seems slow.

As suggested above, use a tool which just copies the content streams and objects, renumbering as required, this will be much faster. In addition to pdfjam (which I don't know anything about) you could also look at pdftk. There are bound to be others as well.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
KenS
  • 30,202
  • 3
  • 34
  • 51
1

If you just need to catenate some pdf's, you might check out pdfjam. I've never found it slow during catentation, but it does at times produce output pdf's that print rather slowly.

link

user1277476
  • 2,871
  • 12
  • 10
0

Use pdfconcat, it'll do it in a split-second. Ghostscript is slow doing everything.

Alasdair
  • 13,348
  • 18
  • 82
  • 138
  • This statement misses the point. Ghostscript is doing a lot more than `pdfjam` is doing. *`pdfjam` is doing one thing only, but doing it fast.* (Note, I didn't say *'...but doing it *well*.'* Why I didn't say that is because 'well' may be defined differently in different circumstances: see also the [updated] answer here http://stackoverflow.com/a/4233975/359307 . That's why I asked my questions to @Matt Orlando above.) – Kurt Pfeifle Apr 27 '12 at 15:51
  • GhostScript does not combine PDFs, it takes them apart and recreates a totally new PDF, having done many other things, including recompressing all the images, which the OP probably does not want it to do. There are many ways to combine PDF files together, some are better than others, as you stated, but Ghostscript is not the best solution. Anyway, I said `pdfconcat` not `pdfjam`. Personally I would have coded the app for this myself, since I don't think there are any really good solutions available. – Alasdair Apr 28 '12 at 03:24
  • I know quite well how Ghostscript works and what its strong (as well as its weak) points are. :-) -- Depending on **how exactly** you want an output PDF to be, Ghostscript *may* be the best solution. If the **speed of finishing the result** is the only concern, Ghostscript certainly isn't the best for the reasons KenS (who is a Ghostscript developer) and you (who obviously did read KenS' comment) outlined... (What I said about `pdfjam` is true for `pdfconcat` too.) Your answer was one-sided, whereas user1277476's answer did qualify its recommendation. – Kurt Pfeifle Apr 28 '12 at 05:49
  • I'm sorry my answer wasn't up to your expectations. – Alasdair Apr 28 '12 at 09:16
0

After tracking down what was causing time-outs, I've noticed that Ghostscript is a lot faster dealing with pdf v1.4 (acrobat 5 compatibility) files as oppose to v1.7. Simply saving the PDF files as v1.4 speeds up when the files are later used by Ghostscript.

This may not be applicable to all situations. Depending on the speed it may be worth using Ghostscript to convert the pdf down to a lower version then other operations. I'm seeing time difference from 30+ seconds to ~1 second just by changing the pdf version.

timtom
  • 133
  • 8