0

I have pdf file with content like sub string and super string

Example : enter image description here

When i read line by line using itext library it will return

1. Introduction of v section 
ref tm
This is simple word document. Us
working or not.
t tm
1.1 Document Summary 
Here is document summary. 

In above case you see if substring it will read as next line and superstring read as first line of particular header

How can i read complete line content using itext jar.

Sample code

public void usingItext() {
    PdfReader pdfReader;
    try {
        pdfReader = new PdfReader("samplewordDoc_pdf_doc_new.pdf");
        int pages = pdfReader.getNumberOfPages();
        for (int i = 1; i < pages; i++) {
            String lines[] = PdfTextExtractor.getTextFromPage(pdfReader, i).split("\\r?\\n");;
             for (int j = 0; j < lines.length; j++) {
                System.out.println(lines[j].toString());
            }
        }
        pdfReader.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
Vijay Gajera
  • 1,306
  • 11
  • 13
  • Have you tried the `HorizontalTextExtractionStrategy` from [this answer](https://stackoverflow.com/a/33697745/1729265)? – mkl Jun 21 '19 at 11:50
  • I am using itext version 5.5.10 there is not found HorizontalTextExtractionStrategy Can you please help me how can i achieve this? – Vijay Gajera Jun 21 '19 at 12:19
  • You may have noticed the "from [this answer](https://stackoverflow.com/a/33697745/1729265)" in that comment. Read it. – mkl Jun 21 '19 at 14:12

0 Answers0