I need to compare and get all the visual differences in the two PDF files. I know there are some questions related to this on stack overflow but they are not fulfilling my need.
I'm currently using PDFBox to generate images for pages in PDF and comparing the bytes of the images.
By this approach I'm able to know that particular page is differing.
But I need to find to know some more fine details such as font size of some text, for say - "The text" is differing in the page number, say 6 in the PDFs.
Not only for text but I need to take care of all the visual differences such as images, text in the charts etc.
Please suggest me someway to achieve this.
PS: I tried using Apache Tika but I'm getting the sense that it could be used to get structured text in XHTML and metadata. But I'm seeing the fine details such as font size, font eight is not appearing in structured text. Please correct me if I'm getting it wrong.