1

I am using ASP for this and I had to generate reports in PDF format and send the file back to clients so they can download it.

I made the reports using MigraDoc library and they were great but after I tried it with Arabic text I found the texts were in LTR and the characters were disjointed so I made this code to test things out

    ...............
    MigraDoc.DocumentObjectModel.Document reportDoc = new MigraDoc.DocumentObjectModel.Document();
    reportDoc.Info.Title = "test";
    sec = reportDoc.AddSection();
    string fileName = "test.pdf";
    addformattedText(sec, "العبارة", true);
    PdfDocumentRenderer renderer = new PdfDocumentRenderer(true);
    renderer.Document = reportDoc;
    renderer.RenderDocument();
    MemoryStream pdfStream = new MemoryStream();
    renderer.PdfDocument.Save(pdfStream);
    byte[] bytes = pdfStream.ToArray();
    ...............


    private void addformattedText(Section sec,string text, bool shouldBeBold = false)
    {
        var tf = sec.AddTextFrame();
        var p = tf.AddParagraph(text);
        p.Format.Font.Name = "Tahoma";
        if (shouldBeBold) p.Format.Font.Bold = true;
    }

I get the output like this test pdf output

I have tried to encode the text and make it a unicode string using this code

 private string getEscapedString(string text)
 {
     if (true || HasArabicCharacters(text))
     {
         string uString = "";
         byte[] utfBytes = Encoding.Unicode.GetBytes(text);
         foreach (var u in utfBytes)
         {
             if (u != 0)
             {
                 uString += String.Format(@"\u{0:x4}", u);
             }
         }
         return uString;
     }
     else
         return text;
 }

and get the returned string into a paragraph and save the PDF documents with unicode parameter set to true

But it is all the same.

I can not figure out how to get it done.

The reports were done using MigraDoc 1.50.5147 library.

Jood jindy
  • 320
  • 4
  • 18
  • Works for me, see a [.NET fiddle](https://dotnetfiddle.net/u3NJoQ). Always remember to post a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Prolog Nov 16 '20 at 21:20
  • ok thanks a lot ill try it – Jood jindy Nov 16 '20 at 21:41
  • i tried it and the same output it gave me – Jood jindy Nov 16 '20 at 21:49
  • I'm a bit confused. What is the result you are expecting? – Prolog Nov 16 '20 at 22:23
  • the string is "العلامة" but in pdf it appears like that in the picture above so i need it to be in pdf as the same as in c# (connected characters with the right character glyphs) i do not know how to do that – Jood jindy Nov 17 '20 at 06:33
  • @Joodjindy what you see in Windows and .NET (not just ASP.NET) has little to do with PDF. Both Windows and .NET strings are Unicode, period. Your strings **already are Unicode strings**. You don't need to escape anything to type Arabic or Chinese and this question proves it - SO is an ASP.NET application. *PDF* on the other side is tricky - it's not even a document format, it's a set of printing instructions. As the answers in the probably duplicate [Unicode in PDF](https://stackoverflow.com/questions/128162/unicode-in-pdf) explain it's a mess – Panagiotis Kanavos Nov 17 '20 at 07:43
  • @Joodjindy as the answers in `Unicode in PDF` show, metadata can be Unicode but the rendered text needs work. If you get a reversed string it means PdfSharp already handles Unicode glyphs but not RTL languages – Panagiotis Kanavos Nov 17 '20 at 07:48

2 Answers2

3

The problem is Arabic language font have 4 different shap in begging,last,connected and alone, where Pdfsharp and MigraDoc can not recognize which shap to print farther more you need to reverse the character order to solve this you can use AraibcPdfUnicodeGlyphsResharper to help do such work as following:

using PdfSharp.Drawing;
using PdfSharp.Pdf;
using AraibcPdfUnicodeGlyphsResharper;
namespace MigraDocArabic
{
internal class PrintArabicUsingPdfSharp
{
    public PrintArabicUsingPdfSharp(string path)
    {
        PdfDocument document = new PdfDocument();
        document.Info.Title = "Created with PDFsharp";
        System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
        // Create an empty page
        PdfPage page = document.AddPage();

        // Get an XGraphics object for drawing
        XGraphics gfx = XGraphics.FromPdfPage(page);

        // Create a font
        XFont font = new XFont("Arial", 20, XFontStyle.BoldItalic);
        var xArabicString = "كتابة اللغة  العربية شيئ جميل".ArabicWithFontGlyphsToPfd();
        // Draw the text
        gfx.DrawString("Hello, World!", font, XBrushes.Black, new XRect(0, 0, page.Width, page.Height), XStringFormats.Center);
        gfx.DrawString(xArabicString, font, XBrushes.Black, new XRect(50, 50, page.Width, page.Height), XStringFormats.Center);

        // Save the document...
        document.Save(path);

    }
}

}

Do not Forget the Extension method By the way this is work with iText7 too see the image for result

Result

0

PDFsharp does not support RTL languages yet: http://www.pdfsharp.net/wiki/PDFsharpFAQ.ashx#Does_PDFsharp_support_for_Arabic_Hebrew_CJK_Chinese_Japanese_Korean_6

You can work around this limitation by reversing the string.

PDFsharp does not support font ligatures yet. You are probably able to work around this limitation by replacing letters with the correct glyph (start, middle, end) depending on the position.

  • that is the main point...in arabic there is a lot of possibilites so how can i replace every character with the right glyph? i saw there is different between glyphs for the same character in UTF-16 so i wonder if there is a relation..if there is a library that give me the UTF-16 code for an arabic word with the right glyphs it would be great...recently im trying iText7 – Jood jindy Nov 17 '20 at 07:48
  • I know hardly anything about Arabic. It seems every letter has a standalone shape, a start shape, an end shape, and a middle shape. Split the text into words, then use the correct codes for start, middle, end, standalone. Just guessing - I hope it works that way. – I liked the old Stack Overflow Nov 17 '20 at 08:58