1

I have a file store online in an azure blob storage in spanish. Some word have special charactere (for example : Almacén) When I open the file in notepad++, the encoding is ANSI.

So now I try to read the file with the code :

        using StreamReader reader = new StreamReader(Stream, Encoding.UTF8);
        blobStream.Seek(0, SeekOrigin.Begin);
        var allLines = await reader.ReadToEndAsync();

the issue is that "allLines" are not proper encoding, I have some issue like : Almac�n

I have try some solution like this one : C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

but still not working

(the final goal is to "merge" two csv so I read the stream of both, remove the header and concatenate the string to push it again. If there is a better solution to merge csv in c# that can skip this encoding issue I am open to it also)

kilag
  • 509
  • 1
  • 6
  • 19

1 Answers1

2

You are trying to read a non-UTF8 encoded file as if it was UTF8 encoded. I can replicate this issue with

var s = "Almacén";
using var memStream = new MemoryStream(Encoding.GetEncoding(28591).GetBytes(s));

using var reader = new StreamReader(memStream, Encoding.UTF8);
var allLines = await reader.ReadToEndAsync();

Console.WriteLine(allLines); // writes "Almac�n" to console

You should be attempting to read the file with encoding iso-8859-1 "Western European (ISO)" which is codepage 28591.

using var reader = new StreamReader(Stream, Encoding.GetEncoding(28591));
var allLines = await reader.ReadToEndAsync();
phuzi
  • 12,078
  • 3
  • 26
  • 50