System.Text.Encoding.Default.GetBytes fails

Question

Here is my sample code:

CodeSnippet 1: This code executes in my file repository server and returns the file as encoded string using the WCF Service:

byte[] fileBytes = new byte[0];
using (FileStream stream = System.IO.File.OpenRead(@"D:\PDFFiles\Sample1.pdf"))
{
    fileBytes = new byte[stream.Length];
    stream.Read(fileBytes, 0, fileBytes.Length);
    stream.Close();
}

string retVal = System.Text.Encoding.Default.GetString(fileBytes);  // fileBytes size is 209050

Code Snippet 2: Client box, which demanded the PDF file, receives the encoded string and converts to PDF and save to local.

byte[] encodedBytes = System.Text.Encoding.Default.GetBytes(retVal); /// GETTING corrupted here

string pdfPath = @"C:\DemoPDF\Sample2.pdf";
using (FileStream fileStream = new FileStream(pdfPath, FileMode.Create))  //encodedBytes is 327279
{
    fileStream.Write(encodedBytes, 0, encodedBytes.Length);
    fileStream.Close();
}

Above code working absolutely fine Framework 4.5 , 4.6.1

When I use the same code in Asp.Net Core 2.0, it fails to convert to Byte Array properly. I am not getting any runtime error but, the final PDF is not able to open after it is created. Throws error as pdf file is corrupted.

I tried with Encoding.Unicode and Encoding.UTF-8 also. But getting same error for final PDF.

Also, I have noticed that when I use Encoding.Unicode, atleast the Original Byte Array and Result byte array size are same. But other encoding types are mismatching with bytes size also.

So, the question is, System.Text.Encoding.Default.GetBytes broken in .NET Core 2.0 ?

I have edited my question for better understanding. Sample1.pdf exists on a different server and communicate using WCF to transmit the data to Client which stores the file encoded stream and converts as Sample2.pdf

Hopefully my question makes some sense now.

Why are you converting a binary file to a string? That's not something you ever want to do, the file is already correctly stored within `fileBytes`. What is your ultimate aim here? — Alex K., Mar 27 '18 at 11:26
System.Text.Encoding is related to text codification, but a PDF is not a text file — Cleptus, Mar 27 '18 at 11:31
Why read a PDF from one file into memory, and then write it to another? Why not just copy the file? — mjwills, Mar 27 '18 at 11:31
Please help us help you by describing what do you want to achieve. — hendryanw, Mar 27 '18 at 11:33
Though... devil's advocate here, but...technically `Encoding.Default`, being extended ASCII, should never corrupt, should it...? I mean all byte values are valid in it. — Nyerguds, Mar 27 '18 at 11:36
.NET Core and ASP.NET Core are different products. You can run ASP.NET Core over .NET Framework, so please don't confuse the frameworks. — Camilo Terevinto, Mar 27 '18 at 11:37
@Nyerguds it would be nice to *think* that it round-tripped, but... I wouldn't like to rely on it, especially if there are NIL bytes in there; tons of text APIs get twitchy when there are NIL bytes — Marc Gravell, Mar 27 '18 at 11:40
I found a remotely distinct work around for my problem (https://stackoverflow.com/questions/35509088/return-pdf-byte-array-wcf), But I don't want to go for it has limitations as file size and timeout issues. — Venkat pv, Mar 28 '18 at 12:11
My question is completely bypassed as simple file copy. None of the answers are related to my question and the question still remains same. Why Encoding.Default.GetBytes is not working as expected in .Net Core 2.0 — Venkat pv, Mar 28 '18 at 12:17
@CamiloTerevinto, Legacy Asp.Net web applications usually runs on .Net Framework and Latest MVC applications could be built using Asp.Net Core for better portability. There is no space for confusion here. When I run Asp.Net (FM 4.6.1) website and MVC Core 2.0 App, the first one responds to server encoding properly and MVC app fails to to do so. — Venkat pv, Mar 28 '18 at 12:23
@Venkatpv "There is no space for confusion here" is completely wrong. I have only worked with ASP.NET Core applications running on .NET Framework. Whether you do it or not is another thing. Don't confuse the runtime (.NET Framework/.NET Core) with the framework (ASP.NET / ASP.NET Core) — Camilo Terevinto, Mar 28 '18 at 12:25

Marc Gravell · Answer 1 · 2018-03-27T11:35:26.330

7

1: the number of times you should ever use Encoding.Default is essentially zero; there may be a hypothetical case, but if there is one: it is elusive

2: PDF files are not text, so trying to use an Encoding on them is just... wrong; you aren't "GETTING corrupted here" - it just isn't text.

You may wish to see Extracting text from PDFs in C# or Reading text from PDF in .NET

If you simply wish to copy the content without parsing it: File.Copy or Stream.CopyTo are good options.

edited Mar 27 '18 at 11:35

answered Mar 27 '18 at 11:33

Marc Gravell

1,026,079
266
2,566
2,900

Thanks for the answer. I got your point that PDF files are not text. Since that approach (encoding) working fine for last 3 years, I tried to replicate the same with my MVC application. Since I realized the problem, I am going to correct it now. :) – Venkat pv Mar 28 '18 at 12:53

System.Text.Encoding.Default.GetBytes fails

1 Answers1