Html to PDf conversion unicode character rendering as empty

Question

I am converting some html into pdf using itext sharp. First i have filled out some html string into String Writer then using below mentioned code to converty byte array into pdf

Problem is unicode character [arabic in specific] is rendering empty.

My code is

var sw = new StringWriter();
                sw = GetHtmlContent();// here i fetch html
                byte[] data;
                using (var sr = new StringReader(sw.ToString()))
                {                   
                    using (var ms = new MemoryStream())
                    {
                        using (var pdfDoc = new Document())
                        {
                            //Bind a parser to our PDF document
                            using (var htmlparser = new HTMLWorker(pdfDoc))
                            {
                                //Bind the writer to our document and our final stream
                                using (var w = PdfWriter.GetInstance(pdfDoc, ms))
                                {
                                    pdfDoc.Open();
                                    //Parse the HTML directly into the document
                                    htmlparser.Parse(sr);
                                    pdfDoc.Close();
                                    //Grab the bytes from the stream before closing it
                                    data = ms.ToArray();
                                }
                            }
                        }
                    }
                }
                Response.Buffer = false;
                Response.Clear();
                Response.ClearContent();
                Response.ClearHeaders();
                Response.ContentType = "application/pdf";
                Response.AddHeader("Content-Disposition", "attachment; filename=Test.pdf");
                Response.BinaryWrite(data);
                Response.End();

Please help me what's wrong in it

Probably the problem is encoding related to your sw = GetHtmlContent() — ale, Jun 25 '14 at 11:05
To test @Infer-On's comment, skip your `GetHtmlContent()` for now and try working with inline HTML as [the sample you got this from shows](http://stackoverflow.com/a/23246169/231316). If that works then your problem is with `GetHtmlContent()`. If that doesn't work, its probably a font problem. Are you specifying a font capable of handling those characters? iText will use Helvetica by default which does not have any Arabic glyphs. — Chris Haas, Jun 25 '14 at 14:02
If some text works but not others then you probably have a font problem. iTextSharp does not use system fonts unless you tell it to. The preferred method is to register the individual font via `iTextSharp.text.FontFactory.Register()`. If you have multiple fonts you can use `iTextSharp.text.FontFactory.RegisterDirectory()`. If you just want to scan the entire system font folder (this could be really slow) you can use `iTextSharp.text.FontFactory.RegisterDirectories()`. Then see this for how to use the font once its registered. http://stackoverflow.com/a/4903223/231316 — Chris Haas, Jun 25 '14 at 20:46
As I commented earlier i know about this font registration thing but the font [ARIALUNI.TTF] they mentioned for registration to resolve this problem is not guaranteed to exists on the system. That's why I need an alternative. Above solution i have already implemented and to avoid font not problem i have copied the fonts in my local directory. But that's seem to be not a good solution that's why I a looking for alternative. — Kamran Shahid, Jun 26 '14 at 04:38
There might be a language barrier issue but your comments are conflicting. "we are guarantee that font's is available" and then " is not guaranteed to exists". Please update your code above with how you are registering the fonts. Also, please post a very small example of your HTML (one small paragraph should be fine) showing how you are using those fonts. Remember, unless your HTML actually says otherwise, or unless you have C# code that changes things, iTextSharp will always use Helvetica. You cannot change iTextSharp's "default font". — Chris Haas, Jun 26 '14 at 13:35
Yes you have correctly notified it. I wan't to write that we are not guaranteed to have font installed on the server. Thanks Chris — Kamran Shahid, Jun 26 '14 at 15:22
In the end i have copied the font in my application directory and reference it from there — Kamran Shahid, Jul 04 '14 at 10:29

score 0 · Answer 1 · answered Dec 23 '15 at 18:23

Check below steps to display unicode characters in converting Html to Pdf

Create a HTMLWorker
Register a unicode font and assign it
Create a style sheet and set the encoding to Identity-H
Assign the style sheet to the html parser

Check below code

    TextReader reader = new StringReader(html);
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(FileName, FileMode.Create));
    HTMLWorker worker = new HTMLWorker(document);
    document.Open();
    FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");
    iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
    ST.LoadTagStyle("body", "encoding", "Identity-H");
    worker.Style = ST;
    worker.StartDocument();

Check below link for more understanding....

Display Unicode characters in converting Html to Pdf

Hindi, Turkish, and special characters are also display during converting from HTML to PDF using this method. Check below demo image.

Except that HTMLWorker is obsolete and you should use XMLWorker instead. — Amedee Van Gasse, Dec 23 '15 at 22:22

Html to PDf conversion unicode character rendering as empty

1 Answers1