6

I would like to know if its possible to convert a PDF to and image without fonts. My goal is to have only the image without text ?

And if yes, can I do it with ImageMagick/GhostScript ?

Here an example

The image final http://crocodoc_public.s3.amazonaws.com/8b8aa154-45e3-41f9-a465-628e1b2e955d/images/page-001.png

and the original PDF http://crocodoc.com/demo/efwpa (page 2) We can see that the text are on overlay over the image, what I want is to do the same.

yvan
  • 938
  • 9
  • 18
  • Not exactly the same question, but it may help: http://stackoverflow.com/questions/653380/converting-a-pdf-to-png – Doc Brown Sep 09 '11 at 13:50
  • Thanks, I saw almost all questions about converting, I spent my night at this... convert PDF to an image I already can, but I would like without the fonts or to do simpler without text. – yvan Sep 09 '11 at 14:33
  • don't understand what you mean, if you convert a PDF to PNG, there are no "fonts" in the PNG. Can you explain in more detail, or (better) give an example? – Doc Brown Sep 09 '11 at 17:02
  • I changed my question and added an example, thanks. – yvan Sep 09 '11 at 17:24

3 Answers3

1

I too am the lookout for something like that. While playing with imagemagick I tried this a command and got some unexpected results.

convert -input.pdf -blur 0x0 output.jpg

this removes the text layers from the pdfs I tried.

I cannot guarantee that this will work for you and if this the right way to achieve, but you may try.

hussainb
  • 1,218
  • 2
  • 15
  • 33
1

So if I got you right, what you want is to remove some text from your PDF (not fonts), and you want to do it programmatically. I suspect you know already that this will only possible if the text is placed on some kind of separate layer in your PDF files. You can try to utilize iText for that. Beware, this will mean you will have to invest some days of learning how to use that library.

Doc Brown
  • 19,739
  • 7
  • 52
  • 88
  • Exactly, I was thinking fonts, because some people have problems with text when they don't have fonts. I gave a look to iText, but I'm totally not comfortable with Java, I will try. – yvan Sep 12 '11 at 05:52
  • I tried iText and I unfortunately can't reach my goal. It seems to be that they don't have what I'm looking for. – yvan Sep 14 '11 at 06:53
  • @yvan: perhaps you should ask a new, more specific question here on SO, posting some lines of your code using iText and tell us exactly which problems occurred / what does not work. iText is a powerful tool, but usage is not always straightforward. – Doc Brown Sep 14 '11 at 11:33
0

You can do that with Adobe Acrobat. Select the text with the touch up tool and delete it. I don't think you can do that with Ghostscript. You could consider editing the PDF by hand (qpdf helps).

topskip
  • 16,207
  • 15
  • 67
  • 99
  • Unfortunately, I don't know in advance how much PDFs I'm gonna have, to do each by hand gonna take a long time. But QPDF is a very interesting tool. I'm gonna consider it like a helper. – yvan Sep 09 '11 at 18:06