1

I am working on Unicode [Marathi] based project and for this project my task is to read Unicode text from a PDF in the following fonts

  1. CDAC-GISTSurekh
  2. CDAC-GISTSurekh+0
  3. CDAC-GISTSurekh+1
  4. CDAC-GISTSurekh+0 Bold
  5. CDAC-GISTSurekh+1 Bold

When I read the PDF using iTextSharp, I get the text as:

ररजज - (एस 13) महरररषष
पररप मतदरर जरदद 2014

where the actual text should be

राज्य - (एस-१३) महाराष्ट्र  प्रारूप मतदार यादी २०१४

Please give me solution if anyone have idea about this.

Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Your output shows you are already reading "Unicode" (if that failed you would not have seen Marathi). Can you provide a link to a sample PDF with this behavior? – Jongware Jan 09 '14 at 12:49
  • Please check this is the link for sample pdf.[Click here to view sample pdf](https://www.dropbox.com/s/ezz015t3qdqo5hk/test.pdf) – Pandurang Pailvan Jan 09 '14 at 13:17
  • 3
    This PDF has *exactly* the same problem as described in http://stackoverflow.com/a/15566820/2564301 -- up to and including the same duplicate Unicode code points. This can't be solved by iTextSharp, nor by Acrobat Pro. – Jongware Jan 09 '14 at 13:43
  • hi, i am also creating pdf using itextsharp, but when pdf is printed with marathi text, some joint word are not printed correctly. for ex- मिरची printed as मरीची , पत्ते printed as पतेते .please give any solution. – banny Mar 19 '14 at 08:52

0 Answers0