In brief, I'm dealing with a problematic PDF, which:
- Cannot be fully rendered in a document viewer like
evince
, because of missing font information; - However -
ghostscript
can fully render the same PDF.
Thus -- regardless of what ghostscript
uses to fill in the blanks (maybe fallback glyphs, or a different method to accessing fonts) -- I'd like to be able to use ghostscript
to produce ("distill") an output PDF, where pretty much nothing will be changed, except font information added, so evince
can render the same document in the same manner as ghostscript
can.
My question is thus - is this possible to do at all; and if so, what would be command line be to achieve something like that?
Many thanks in advance for any answers,
Cheers!
Details:
I'm actually on an older Ubuntu 10.04, and I might be experiencing - not a bug - but an installation problem with evince
(lack of poppler-data
package), as noted in Bug #386008 “Some fonts fail to display due to “Unknown font tag...” : Bugs : “poppler” package : Ubuntu.
However, that is exactly what I'd like to handle, so I'll use the fontspec.pdf
attached to that post ("PDF triggering the bug.", // v.) to demonstrate the problem.
evince
First, I open this pdf's page 3 in evince
; and evince
complains:
$ evince --page-label=3 fontspec.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'F5.1'
Error (7597): No font in show
Error: Unknown font tag 'F5.1'
Error (7630): No font in show
Error: Unknown font tag 'F5.1'
Error (7660): No font in show
Error: Unknown font tag 'F5.1'
...
The rendering looks like this:
... and it is obvious that some font shapes are missing.
Adobe acroread
Just a note on how Adobe's Acrobat Reader for Linux behaves; the following command line:
$ ./Adobe/Reader9/bin/acroread /a "page=3" fontspec.pdf
... generates no output to terminal whatsoever (for more on /a
switch, see Man page acroread) -- and the program has absolutely no problem displaying the fonts.
Also, while I'd like to avoid the roundtrip to postscript - however, note that acroread
itself can be used to convert a PDF to postscript:
$ ./Adobe/Reader9/bin/acroread -v
9.5.1
$ ./Adobe/Reader9/bin/acroread -toPostScript \
-rotateAndCenter -choosePaperByPDFPageSize \
-start 3 -end 3 \
-level3 -transQuality 5 \
-optimizeForSpeed -saveVM \
fontspec.pdf ./
Again, the above command line will generate no output to terminal; -optimizeForSpeed -saveVM
are there because apparently they deal with fonts; the last argument ./
is the output directory (output file is automatically called fontspec.ps
).
Now, evince
can display the previously missing fonts in the fontspec.ps
output - but again complains:
$ evince fontspec.ps
GPL Ghostscript 9.02: Error: Font Renderer Plugin ( FreeType ) return code = -1
GPL Ghostscript 9.02: Error: Font Renderer Plugin ( FreeType ) return code = -1
...
... and furthermore, all text seems to be flattened to curves in the postscript - so now one cannot select the text in the .ps file in evince
anymore (note that the .ps file cannot be opened in acroread
). However, one can convert this .ps back into .pdf again:
$ pstopdf fontspec.ps # note, `pstopdf` has no output filename option;
# it will automatically choose 'fontspec.pdf',
# and overwrite previous 'fontspec.pdf' in
# the same directory
... and now the text in the output of pstopdf
is selectable in evince
, all fonts are there, and evince
doesn't complain anymore. However, as I noted, I'd like to avoid roundtrip to postscript files altogether.
display
(from imagemagick
)
We can also observe the page in the same document with imagemagick
s display
(note that image panning from the commandline using 'display' is apparently still not available, so I've used -crop
below to adjust the viewport):
$ display -density 150 -crop 740x450+280+200 fontspec.pdf[2]
**** Warning: considering '0000000000 00000 n' as a free entry.
...
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Mac OS X 10.5.4 Quartz PDFContext <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
... which generates some ghostscrip
ish errors - and results with something like this:
... where it's obvious that the missing fonts that evince
couldn't render, are now shown here, with imagemagick
s display
, properly.
ghostscript
Finally, we can use ghostscript as x11 viewer itself -- to observe the same page, same document:
$ gs -sDevice=x11 -g740x450 -r150x150 -dFirstPage=3 \
-c '<</PageOffset [-120 520]>> setpagedevice' \
-f fontspec.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
Processing pages 3 through 74.
Page 3
>>showpage, press <return> to continue<<
^C
... and results with this output:
In conclusion: ghostscript
(and apparently by extension, imagemagick
) can seemingly find the missing font (or at least some replacement for it), and render a page with that -- even if evince
fails at that for the same document.
I would, therefore, simply like to export a PDF version from ghostscript
, that would have only the missing fonts embedded, and no other processing; so I try this:
$ gs -dBATCH -dNOPAUSE -dSAFER \
-dEmbedAllFonts -dSubsetFonts=true -dMaxSubsetPct=99 \
-dAutoFilterMonoImages=false \
-dAutoFilterGrayImages=false \
-dAutoFilterColorImages=false \
-dDownsampleColorImages=false \
-dDownsampleGrayImages=false \
-dDownsampleMonoImages=false \
-sDEVICE=pdfwrite \
-dFirstPage=3 -dLastPage=3 \
-sOutputFile=mypg3out.pdf -f fontspec.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
**** Warning: considering '0000000000 00000 n' as a free entry.
Processing pages 3 through 3.
Page 3
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Mac OS X 10.5.4 Quartz PDFContext <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
... but it doesn't work - the output file mypg3out.pdf
suffers from the exact same problems in evince
as noted previously.
Note: While I'd like to avoid the postscript roundtrip, a good example of gs
command line with from pdf to ps with font embedding is here: (#277826) pdf - How to make GhostScript PS2PDF stop subsetting fonts; but the same command line switches for .pdf to .pdf to not seem to have any effect on the problem described above.