0

I know that PDFs are not for editing,but I have a requirement where I need to parse a PDF and modify it to convert all text elements to a hyper link. Is there a way to achieve this?

Many Thanks,

Mukesh Kumar
  • 783
  • 1
  • 9
  • 24
  • 2
    Essentially you'll have to apply text extraction with the twist of also extracting the text location, not merely the plain text. In that extracted text with location you have to find all texts which you want to make hyper links. Having found them, take their locations and add appropriate annotations to make them hyper linked. Extraction of text with location can be done in iText using a variant of the `LocationTextExtractionStrategy` and in PDFBox overriding `PDFTextStripper.writeString(String, List)`. – mkl Jul 21 '14 at 07:17
  • I think, Your first comment is quite enough for an answer.can it be promoted as an answer? – Mukesh Kumar Jul 23 '14 at 08:13
  • Ok, I've made it an answer. – mkl Jul 23 '14 at 09:20

1 Answers1

1

To convert text elements to hyper links involves multiple operations:

  1. You have to apply text extraction with the twist of also extracting the text location, not merely the plain text.

  2. In that extracted text with locations you have to find all text parts which you want to make hyper links.

  3. Having found them, take their locations and add appropriate annotations to make them hyper linked.

Extraction of text with location can be done in iText implementing a variant of the LocationTextExtractionStrategy (cf. this answer; even though it is written for iTextSharp, the same principles apply) and in PDFBox overriding PDFTextStripper.writeString(String, List<TextPosition>).

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265