1

I am a new to the programming world. Just got a task to read a pdf file which is in Hindi language. There are thousands of references, I got it on internet about how to read pdf file but unfortunately not about any for language specific.

Can anyone please help me on this?

Note: Each time when I read pdf it's coming as non readable text.

Widor
  • 13,003
  • 7
  • 42
  • 64
  • 1
    Please read http://stackoverflow.com/questions/10900838/read-localized-pdf-file-using-itextsharp and http://stackoverflow.com/questions/10185643/reading-pdf-content-using-itextsharp-in-c-sharp/10191879#10191879 – HatSoft Jul 20 '12 at 10:59
  • The problem with PDF and complex scripts is that PDF usually just says »This glyph goes at this position on the page« and there's no relationship to text anymore. Usually Unicode text is converted into glyphs by the layout engine so what is displayed doesn't nevessarily match the code points used in the text (e.g. ligatures, etc.). But with PDF this layout step was already done, so you usually get meaningless gibberish when trying to extract text. (Depending on the application used to produce the PDF this even encompasses ASCII.) – Joey Jul 20 '12 at 11:13
  • Thanks Joey, I will try to to something while rendering [Extracting] contents from file. – Mahesh Chavan Jul 21 '12 at 04:02

0 Answers0