2

I'm trying to covert PDF to Image using GhostScript9.19. But sometimes I can see the warning logs below. (this log is repeating so many times) **** Warning: considering '0000000000 XXXXX n' as a free entry

Fortunately, the image is created. However, it takes so long time to get the Image when the Error log is shown. In normal condition, the converting almost takes 200~400ms. But in this case(warning condition), it takes more than 15s.

I found a clue to solve this problem. Handling (remapping) missing/problematic (CID/CJK) fonts in PDF with ghostscript?

This problem is cased by CID Font of Chinese, Japanese, and Korean.

Especially, I don't need to Convert Text in PDF files. ( or I can change any font instead of the CID Fonts.) I need to convert pictures in PDF files.

So, How can I skip the text in pdf when I converting PDF to Image using GhostScript? Is there options to do this? Or I can edit the SourceCode of GhostScript... but I don't know what I need to change..

Please give me some opinion.

Community
  • 1
  • 1
Baruian
  • 122
  • 4
  • 12

1 Answers1

4

I am certain that the problem is not caused by CIDFonts, the 'problem' is caused by the PDF file being invalid. The cross-reference table has entries which do not conform to the specification. That's what Ghostscript is telling you.

However, I very much doubt that this is the reason that the processing takes so long. The combination of the time taken and the warning message makes me suspect that the file is invalid in some other way (possibly it has been through email or some other process that alters CR/LF characters).

If the file is determined to be invalid in certain ways (for example, the cross-reference table states that an object is at a specific offset in the file but there is no object at that location) then Ghostscript will attempt to repair the file. It does this by rescanning the entire file looking for every object definition, and rebuilding the cross-reference table. This can be a lengthy process, and if the file contains a great deal of binary data (eg images) then it can take a significant amount of time.

So even if you tell Ghostscript to ignore the text it will not solve your problem, the inptu PDF file will be still be damaged in a way which means that the cross-reference table needs to be rebuilt, and so will still take just as much time.

Once a PDF file has been damaged, there's no easy way to fix it. If you are seeing a number of files like this then you should check the source of the files. Note that the complete transcript (which you haven't given) should include information about the application which produced the PDF file.

To answer the question; recent versions of Ghostscript (and you haven't mentioned what version you are using either, nor on which operating system) include a set of 3 command line options to ignore various types of input. If you set -dFILTERTEXT then text will be dropped. Certainly you can modify the Ghostscritpt source code. The PDF interpreter, however, is largely written in PostScript, unless you are a very experienced PostScript programmer you will find it challenging to modify.

Even if you do modify the source, or use -dFILTERTEXT I doubt you will be able to avoid rebuilding the PDF file. Its not possible to say for certain without seeing an example but it sounds to me like the PDF file is simply damaged, and needs to be repaired.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • 1
    Dear KenS. Thank you for your answer. You are correct. I tried to use -dFILTERTEXT option to avoid Text when I convert PDF to Image, and the problem wasn't solved. Also, I'm not good at PostScript Language, so it is almost impossible to edit GhostScript SourceCode. I'm using GhostScript 9.19 & Windows 10 64bit ( Application is created by .NET. I will execute it on various Windows OS. Anyway, I'm so appreciate with your answer. Thank you. Have a nice day! – Baruian Jul 07 '16 at 08:39