5

Currently, I have a series of images (PNGs) and, for each, an unformatted text version of their content. I'd like to make a PDF where each image becomes a full page of the resulting PDF, with the corresponding text somehow also attached to the page, so that searching for some words brings you to pages with that text on it, even though the text is never directly displayed.

This is a one-shot job, so it doesn't have to be neat or scalable. I could use any language commonly available on a Linux system, or common command-line tools. (I also have a Windows system with Acrobat available, though there are near a thousand images, so something manual wouldn't work.)

JoshDM
  • 4,939
  • 7
  • 43
  • 72
jade
  • 744
  • 5
  • 16
  • My experience with PDFs is basically limited to viewing them, choosing "Save as PDF" from print dialog boxes, and scanning with Acrobat with the default options. – jade Jan 29 '13 at 19:43

1 Answers1

1

One option to try would be to generate a PDF using Java and Apache-Fop, but that might be more work than you're looking to do.

You might do better with iText; Example of adding PNG to iText to generate PDF

You will need to determine how to generate a Layer in which to place your searchable text; I am unable to advise you on how to do this step.

Here is how you can tell if a PDF contains text, which might help you with building one.

Community
  • 1
  • 1
JoshDM
  • 4,939
  • 7
  • 43
  • 72
  • Using FOP to generate a PDF with one image per page seems reasonably straightforward. Does XSL-FO have a way to include searchable-but-not-visible text? Or could I use some hack like hiding the text behind the image, etc.? – jade Jan 29 '13 at 20:07
  • Updated with suggestion of using iText after searching against Fop and Layers revealed that iText may be a better alternative than Fop. I haven't generated Layers with iText before, but I assume that's what you're looking for to hide your searchable text. It also might be easier to use than Fop. – JoshDM Jan 29 '13 at 20:37
  • 2
    It is quite easy to "layer image over text" in PDF by simply first adding the text to the page content and then the image, no magic involved. Alternatively, btw, PDF allows you to add text in a mode not showing the text (only making it selectable, copy&pastable, etc.). – mkl Jan 29 '13 at 21:53
  • Yes, but how to work this in Fop or iText, then would be the next step. – JoshDM Jan 30 '13 at 13:06