How to write UTF-8 characters to a pdf file using itextsharp?

Question

I have tried a lot on google but not able to find..

Any help is appreciated.

Please find the code below:-

protected void Page_Load(object sender, EventArgs e)
    {
        StreamReader read = new StreamReader(@"D:\queryUnicode.txt", Encoding.Unicode);
        string str = read.ReadToEnd();

        Paragraph para = new Paragraph(str);

        FileStream file = new FileStream(@"D:\Query.pdf",FileMode.Create);

        Document pdfDoc = new Document();
        PdfWriter writer = PdfWriter.GetInstance(pdfDoc, file );

        pdfDoc.Open();
        pdfDoc.Add(para);
        pdfDoc.Close();

        Response.Write("Pdf file generated");
    }

What problems are you seeing? If it's missing characters then have a look here: http://stackoverflow.com/questions/1322303/html-to-pdf-some-characters-are-missing-itextsharp — Nick, May 24 '11 at 12:27
Yes, the characters are missing in pdf, but i have already seen and tried this link, when I downloaded the source code of itextsharp, it didn't have the `FactorySettings.cs` file in it. And also, he is using "arial.ttf", I want UTF-8 characters. — teenup, May 24 '11 at 12:35
Actually, the notepad from which I was fetching the string was saved as ANSI coded, when I changed it as "UTF-8" coded, now those characters are showing up in pdf as `æ`. — teenup, May 24 '11 at 12:47

score 23 · Accepted Answer · edited May 23 '17 at 12:34

23

Are you converting HTML to PDF? If so, you should note that, otherwise never mind. The only reason I ask is that your last comment about getting æ makes me think that. If you are, check out this post: iTextSharp 5 polish character

Also, sometimes when people say "Unicode" what they're really trying to do is to get symbols like Wingdings into a PDF. If you mean that check out this post and know that Unicode and Wingding Symbols really aren't related at all. Unicode symbols in iTextSharp

Here's a complete working example that uses two ways to write Unicode characters, one using the character itself and one using the C# escape sequence. Make sure to save your file in a format that supports wide characters. This sample uses iTextSharp 5.0.5.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create our document object
            Document Doc = new Document(PageSize.LETTER);

            //Create our file stream
            using (FileStream fs = new FileStream(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
            {
                //Bind PDF writer to document and stream
                PdfWriter writer = PdfWriter.GetInstance(Doc, fs);

                //Open document for writing
                Doc.Open();

                //Add a page
                Doc.NewPage();

                //Full path to the Unicode Arial file
                string ARIALUNI_TFF = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");

                //Create a base font object making sure to specify IDENTITY-H
                BaseFont bf = BaseFont.CreateFont(ARIALUNI_TFF, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

                //Create a specific font object
                Font f = new Font(bf, 12, Font.NORMAL);

                //Write some text, the last character is 0x0278 - LATIN SMALL LETTER PHI
                Doc.Add(new Phrase("This is a test ɸ", f));

                //Write some more text, the last character is 0x0682 - ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE
                Doc.Add(new Phrase("Hello\u0682", f));

                //Close the PDF
                Doc.Close();
            }
        }
    }
}

When working with iTextSharp you have to make sure that you're using a font that supports the Unicode code points that you want to use. You also need to specify IDENTITY-H when using your font. I don't completely know what it means but there's some talk about it here: iTextSharp international text

edited May 23 '17 at 12:34

Community

1
1

answered May 24 '11 at 13:47

Chris Haas

53,986
12
141
274

@Chris, The characters you have written i.e. ɸ and \u0682 are coming correct but the characters in my file are still coming in code form. e.g. Character `æ` is coming as `æ`, `ø` is coming as `ø`. These are coming fine on the web page in the GridView and I have used UTF-8 in the Response Content Type. – teenup May 25 '11 at 05:06
@Chris, If I write these characters using code i.e. `new Phrase("æ ø å", font)` ,then they come fine. But I am fetching text from a text file saved as UTF8 encoded, converting it to string using StreamReader and then passing this string to the `Phrase constructor`. – teenup May 25 '11 at 06:10
@Puneet Dudeja, you are talking about a gridview and also a text file, which are you working with? These are two separate things that you need to further explain in your question. For the text file, are you sure that its UTF-8 encoded (you've checked it with a hex editor)? How are you fetching the text file? File system or web? For the gridview, how are you fetching that? Please edit your post above with some code so we can help you better. – Chris Haas May 25 '11 at 12:53
@Chris, I have included the whole code in my question. This code also includes the last two lines of your example code, and those characters are coming fine in the pdf. But the other characters in my text file (swedish characters) are coming as #encoded. Please help. – teenup May 27 '11 at 07:52
@Puneet Dudeja, are you able to email me the contents the file queryUnicode.txt? I understand if its confidential but it would help to see that. If you can, zip it and send it. Also, and this is true in general for debugging anything, but it would help if you could remove any code above that isn't causing a problem. For instance, there's code that creates headers in a table, that can be removed when posted here because its not part of the problem. In general, if you can get it down to the smallest amount of code that still breaks then we are more likely to be able to find a problem. – Chris Haas May 27 '11 at 12:57
@Chris, I was shocked when I tried to reproduce the problem after isolating the minimum code required, but it worked like a charm. The swedish characters are comin in the pdf and that too without using any special Font object. But the environment was different, I have tried it at my Home machine. I don't know why it is not working in office. I have posted that minimum code in my question. I will try again in my office and if I get the problem again, I will mail the file to you(its not confidential). - Great thanks for your help. Please help me again if I get the problem on Monday. – teenup May 27 '11 at 14:58
@Chris, Also, how will I mail you the file, there is no way on stackoverflow to send you the file. – teenup May 27 '11 at 15:02
@Puneet Dudeja, hope everything goes good for you. I'll be out Monday so if you have any problems you probably won't hear from my until Tuesday. – Chris Haas May 27 '11 at 16:06
1

Thanks, the BaseFont.IDENTITY_H is working for me. Kool beans! – minhnguyen Apr 04 '15 at 17:35
'Identity-H' is not recognized. – Sras Oct 09 '20 at 05:04

score 0 · Answer 2 · answered Mar 19 '22 at 08:11

public static Font GetArialUtf8Font(int fontSize = 9)
        {
            string fontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIAL.TTF");

            //Create a base font object making sure to specify IDENTITY-H
            BaseFont bf = BaseFont.CreateFont(fontPath, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

            //Create a specific font object
            return new Font(bf, fontSize, Font.NORMAL);
        }

How to write UTF-8 characters to a pdf file using itextsharp?

2 Answers2

Linked

Related