0

I want to extract bangla text from a pdf file using iTextSharp NuGet in c#. In this pdf text is like this: মোঃ শুকুকুর আলী , মোঃ জালাল মিয়া. I want to read this texts as like this. But when I read this in c# using iTextSharp. return �মাঃ জা লাল িম য়া, �মাঃ �ক ু র আলী. How to solve this problem? I'm attaching my pdf file and code here.

My controller code

using (PdfReader reader = new PdfReader(path)) 
{     
    for (int pageNo = 1; pageNo <= 1; pageNo++)     
    {         
        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();         
        string text = PdfTextExtractor.GetTextFromPage(reader, pageNo, strategy); 
    } 

    reader.Close(); 
}

In the text variable extracted texts showing like broken.

and my pdf file link https://drive.google.com/drive/folders/1L18hGoBaSQl8xCUIXVpWUbOnhWtsSPRi?usp=share_link

and pdf font details:

enter image description here

Dharman
  • 30,962
  • 25
  • 85
  • 135

0 Answers0