0

I am using itextsharp.dll for read pdf. we need to achieve convert PDF file to Text file with formatting.

For E.g PDF: Hello World

I required output in text file :"Hello < b >world< /b >"

How to find text formatting in PDF

Can some one help.

  • 1
    Why do you think that is even possible? Some fonts don't even know that they are bold. See [How to check if a font is bold?](http://developers.itextpdf.com/question/how-check-if-font-bold) You are asking for something that is extremely difficult and you won't find software that can do this in a 100% reliable way. – Bruno Lowagie Jul 11 '16 at 16:58
  • 1
    I [wrote this](http://stackoverflow.com/a/6884297/231316) many years ago as an example of extracting text with minimal formatting using HTML as the export medium. As it stands, it can extract font name and size and does a little bit of a job at guessing about the boldness of a font. As Bruno said, this is a rather complicated subject and it might work with some PDFs and completely break with others. If you've got "simple" PDF (and you're lucky), you might be able to use this as a start. – Chris Haas Jul 11 '16 at 17:14
  • Hello Chris Thanks for help,as per your example i can achieve to extract "Bold" and "Italic" font. but i can't extract underline ,and superscript and subscript letter. do you have any idea on this? – Patel Mayank Jul 13 '16 at 18:14

0 Answers0