0

I'm using iTextSharp to return the text from a page in a PDF document, using this :

var locationTextExtractionStrategy = new LocationTextExtractionStrategy();
string textFromPage = PdfTextExtractor.GetTextFromPage(pdfReader, i + 1, locationTextExtractionStrategy);

I understand from previous questions here that I need to access

renderInfo.GetBaseline().GetStartPoint();

But I don't understand how to call that method from LocationTextExtractionStrategy()

Community
  • 1
  • 1
RamblerToning
  • 926
  • 2
  • 13
  • 28
  • You don't. Here's an answer that should point you in the right direction: [Link](http://stackoverflow.com/questions/7096093/itextsharp-getfieldpositions-to-setsimplecolumn) – safetyOtter Mar 17 '14 at 16:22
  • 1
    You don't. If you search an AcroForm text field position, follow @safetyOtter's link. Otherwise you have to create your own extraction strategy and in its `renderText` method you can access the `TextRenderInfo renderInfo`. Your strategy of course can be derived from an existing strategy or a copy of one with some changes. Cf. e.g. [here](http://stackoverflow.com/a/13719947/1729265) or [here](http://stackoverflow.com/a/15086367/1729265). – mkl Mar 17 '14 at 16:27
  • I've tried using AcroFields, AcroFields form = pdfReader.AcroFields; but form is empty and has no keys or values. I don't think my PDF has data encoded as forms. – RamblerToning Mar 17 '14 at 17:21
  • I've written my own extraction strategy now, and it's populating a class I've made public class TextBoxPos { public float xstart; public float xend; public float ystart; public float yend; public StringBuilder content = new StringBuilder(); } But PdfTextExtractor.GetTextFromPage only returns a string. How can I modify this to return something other than a string? I want to return the text as well as the co-ordinates.. – RamblerToning Mar 18 '14 at 14:42

0 Answers0