0

I am trying to convert an html to pdf using ITextSharp library version 5.5.13.2. For some reason, the Romanian special characters (ș, ț, ă, î, â) that exists inside the html are omitted and does not appear in the pdf file. Could you please help me?

var htmlString = "Some html string containing romanian characters";

byte[] byteArray = Encoding.Unicode.GetBytes(Format.Invariant(htmlString));
Stream reader = new MemoryStream(byteArray);

Document document = new Document(PageSize.A4, 30, 30, 30, 30);
PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

document.AddTitle(PdfTitle);
document.AddSubject(PdfSubject);
document.AddAuthor(PdfAuthor);
document.AddCreator(PdfCreator);
document.AddKeywords(PdfKeyWords);

document.Open();

XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader, null, Encoding.Unicode, new TimesNewRomanUnicodeFontFactory());

document.Close();

And my TimesNewRomanUnicodeFontFactory is like this:

private class TimesNewRomanUnicodeFontFactory : FontFactoryImp
    {
        private readonly BaseFont _baseFont;

        public TimesNewRomanUnicodeFontFactory()
        {
            _baseFont = BaseFont.CreateFont(FontFactory.TIMES_ROMAN, BaseFont.CP1250, BaseFont.EMBEDDED);
        }  
    }

Could you please help me?

zhulien
  • 5,145
  • 3
  • 22
  • 36
  • 1
    Does this answer your question? https://stackoverflow.com/questions/37442332/what-encoding-to-set-for-itextsharp-romanian-language/37443034 – Akshay G Jul 28 '21 at 07:56
  • 1
    The problem is the PDF format itself, not iTextSharp. PDF is essentially ASCII. To use non-ASCII text you need to encode the text *and* use a font that covers the language you want. That's why there are *dozens* of questions like [Unicode in PDF](https://stackoverflow.com/questions/128162/unicode-in-pdf) and [How to write UTF-8 characters to a pdf file using itextsharp?](https://stackoverflow.com/questions/6110311/how-to-write-utf-8-characters-to-a-pdf-file-using-itextsharp). Libraries like iText and iTextSharp take care of encoding, but you still need the font – Panagiotis Kanavos Jul 28 '21 at 08:17
  • 1
    In your case, you just need to load the `Times New Roman` font that comes with Windows. All TrueType and OpenType fonts on Windows are Unicode even if they don't cover *every* available character. Almost all of them include glphs for the `Latin Extended-B` range which includes Romanian characters. – Panagiotis Kanavos Jul 28 '21 at 08:30
  • Thanks all for the input. I've tried to add the base font that comes from Windows _baseFont = BaseFont.CreateFont("c:\\windows\\fonts\\times.ttf", BaseFont.CP1250, BaseFont.EMBEDDED); but the outcome is the same (romanian characters does not appear) – Petri Adrian Jul 28 '21 at 09:02

0 Answers0