Solved: C# MemoryStream converted to byte array, then to base64 string removes RTL characters

Question

I am adding page numbers to a pdf file,

It works correctly with english, but when I try to add hebrew text it ommits those letters.

I assume the problem is with the encoding to base64, how do I solve this?

Code Example

byte[] myBinary = File.ReadAllBytes(path);
using (var reader = new PdfReader(myBinary))
{
    using (var ms = new MemoryStream())
    {
        using (var stamper = new PdfStamper(reader, ms))
        {
            int PageCount = reader.NumberOfPages;
            for (int i = 1; i <= PageCount; i++)
            {
                ColumnText.ShowTextAligned(stamper.GetUnderContent(i),
        Element.ALIGN_CENTER, new Phrase(String.Format("{0} מתוך {1}", i, PageCount)), 297f, 15f, 0);
            }
        }
        myBinary = ms.ToArray();
    }
}
string base64EncodedPDF = System.Convert.ToBase64String(myBinary);
return base64EncodedPDF;

In the front all I do is download the file.

 $scope.open_letter = function (letter) {
   var _letter = myService.PrintLetter().then(function (data) {
       var pdfAsDataUri = "data:application/pdf;base64," + data.data;
       var a = document.createElement("a");
       a.href = pdfAsDataUri;
       a.download = "מכתב" + ".pdf";
       a.click();
});
}

The reason I am asking this question is because in English it works perfectly, but it just ommits the Hebrew letters, which is interesting- I would assume it would replace it with weird characters.

what characters are being removed? base-64 is a binary operation - it doesn't care about characters; can you be more specific about what you're doing and why? — Marc Gravell, Jul 15 '20 at 11:23
please add the code you use for decoding, the problem is very unlikely ToBase64String — Patrick Beynio, Jul 15 '20 at 11:37
I think the anchor points may be wrong since hebrew is right to left. Try changing the 297f. See : https://stackoverflow.com/questions/35280015/itextsharp-showtextaligned-anchor-point — jdweng, Jul 15 '20 at 12:29
You may be missing a font. Did a search for same issue using Turkish (is also right to left) : https://stackoverflow.com/questions/50086417/created-pdf-file-is-missing-characters-in-turkish-language — jdweng, Jul 15 '20 at 12:38
Have you tried writing the modified PDF to a file-stream and checked that it works as intended? This would help isolate the problem to either the PdfStamper or the encoding. — JonasH, Jul 15 '20 at 12:47
What font is the styling using? Is the Hebrew letter inside the view (or outside the margins)? Is the Hebrew text behind a different object? — jdweng, Jul 15 '20 at 14:36

score 0 · Accepted Answer · answered Jul 22 '20 at 09:45

So I finally managed to solve this issue.

The problem was not that I was missing a font, but I wasn't sending one at all to the new Phrase function.

I guess it knows what to do with the english letters, but not with the hebrew ones.

What I did was this:

BaseFont bf = BaseFont.CreateFont("c:/windows/Fonts/GISHA.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
Font f= new Font(bf, 8, Font.NORMAL, BaseColor.BLACK);

and then in my loop for page numbers, I did this:

int PageCount = reader.NumberOfPages;
for (int i = 1; i <= PageCount; i++)
{
   ColumnText.ShowTextAligned(stamper.GetUnderContent(i),
   Element.ALIGN_CENTER, new Phrase(String.Format("{1} ךותמ {0}", i, PageCount), f), 297f, 15f, 0);
 }

Which solved my issue, and now it works beautifully.

score 0 · Answer 2 · answered Jul 22 '20 at 10:02

An advice for you : Don't use MemoryStream!

Use RecyclableMemoryStream, if you want to avoid OutOfMemory issues due to memory fragmentation.

https://www.philosophicalgeek.com/2015/02/06/announcing-microsoft-io-recycablememorystream/

Memorystream and Large Object Heap

Solved: C# MemoryStream converted to byte array, then to base64 string removes RTL characters

2 Answers2