I have a PDF file that was produced with iText and created with JasperReports (I don't know if it's relevant) and I was wondering if I can find some API or anything to see the structure because I need to extract text from it.
- I tried with iText, PDFBox and other Java libraries but I only get text line by line and that's not what I need.
- I also tried conversion in HTML, XML, DOM but I get the same result with text extraction, no structure parsed.
- If I try to open it as DOCX I see that Word recognize sort of structure, for example an area that looks like a table in PDF, after conversion in DOCX it is actually a table.
I need to understand how the PDF was created, if this is possible. I know that working with PDF's is not easy, but I need to start with something useful. Thanks!