2

Which System.Text.Encoding is used by CharSet.Ansi?

I want to decode a string in a .NET Core app (that was marshalled before by C++ code) without defining a structure and using Marshal.PtrToStructure.

Encoding.GetEncoding(???).GetString(...)

In a .NET Framework application System.Text.Encoding.Default works:

[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct Structure
{
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 5)]
    public string FieldA;
}

public class Net461App
{
    static void Main(string[] args)
    {
        var @struct = new Structure { FieldA = "äöüß" };
        byte[] buffer = ToByteArray(@struct);
        var unmarshalled = ToStructure<Structure>(buffer).FieldA;                     // "äöüß"

        Console.WriteLine(Encoding.Default.GetString(buffer).Trim('\0'));             // "äöüß"
        Console.WriteLine(Encoding.Default.EncodingName);                             // Western European (Windows)                    
        Console.WriteLine(Encoding.Default.CodePage);                                 // 1252

        int ansiCodePage = Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage; // 1252
        Encoding ansiEncoding = Encoding.GetEncoding(ansiCodePage);                   // works
        Console.WriteLine(ansiEncoding.GetString(buffer).Trim('\0'));                 // "äöüß"
        Console.WriteLine(ansiEncoding.EncodingName);                                 // Western European (Windows) 
        Console.WriteLine(ansiEncoding.CodePage);                                     // 1252 
    }

    public static byte[] ToByteArray<T>(T structure) where T : struct
    {
        var buffer = new byte[Marshal.SizeOf(structure)];
        IntPtr handle = Marshal.AllocHGlobal(buffer.Length);
        try
        {
            Marshal.StructureToPtr(structure, handle, true);
            Marshal.Copy(handle, buffer, 0, buffer.Length);
            return buffer;
        }
        finally
        {
            Marshal.FreeHGlobal(handle);
        }

    }

    public static T ToStructure<T>(byte[] buffer) where T : struct
    {
        IntPtr handle = Marshal.AllocHGlobal(buffer.Length);
        try
        {
            Marshal.Copy(buffer, 0, handle, buffer.Length);
            return Marshal.PtrToStructure<T>(handle);
        }
        finally
        {
            Marshal.FreeHGlobal(handle);
        }

    }
}

Encoding.Default is the same as Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage which both produce the same string as CharSet.Ansi. But is TextInfo.ANSICodePage always the same as CharSet.Ansi?

Encoding.Default is different in .NET Core and the code page 1252 is not supported:

public class NetCore2App
{
    static void Main(string[] args)
    {
        var @struct = new Structure { FieldA = "äöüß" };
        byte[] buffer = ToByteArray(@struct);
        var unmarshalled = ToStructure<Structure>(buffer).FieldA;                     // "äöüß"

        Console.WriteLine(Encoding.Default.GetString(buffer).Trim('\0'));             // "????"
        Console.WriteLine(Encoding.Default.EncodingName);                             // Unicode(UTF - 8)                
        Console.WriteLine(Encoding.Default.CodePage);                                 // 65001

        int ansiCodePage = Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage; // 1252
        Encoding ansiEncoding = Encoding.GetEncoding(ansiCodePage);                   // throws "No data is available for encoding 1252."
        Console.WriteLine(ansiEncoding.GetString(buffer).Trim('\0'));                 // ...
        Console.WriteLine(ansiEncoding.EncodingName);                                 // ...
        Console.WriteLine(ansiEncoding.CodePage);                                     // ...
    }
    // ...
}

Update:

Investigating the suggestion to use System.Text.Encoding.CodePages I found the following hint about getting the systems current ANSI code page:

Encoding.GetEncoding

The following seems to work:

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
int currentAnsiCodePage = Encoding.GetEncoding(0).CodePage;
Encoding encoding = Encoding.GetEncoding(currentAnsiCodePage);

Each of the following gives me the same code page that correctly decodes the string in my test

  • Encoding.GetEncoding(0).CodePage (requires CodePagesEncodingProvider registration)
  • Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage
  • CultureInfo.CurrentCulture.TextInfo.ANSICodePage

Not sure which one is preferable and works an all machines.

Emmanuel DURIN
  • 4,803
  • 2
  • 28
  • 53
wierob
  • 4,299
  • 1
  • 26
  • 27
  • 1
    I expect Ansi to always return the system locale codepage, i.e. 1252 for a Western European Windows, 1250 for e.g. Czech, 1251 for russian, etc. As for 1252 being unknown, see here: https://stackoverflow.com/questions/37870084/net-core-doesnt-know-about-windows-1252-how-to-fix – LocEngineer Feb 27 '18 at 13:31
  • But how do I get the current system local codepage programmatically? Is TextInfo.ANSICodePage the correct way? – wierob Feb 27 '18 at 13:44
  • I would expect this to be so, yes. – LocEngineer Feb 27 '18 at 13:48
  • 1
    The thing called "ANSI" in native functions (which is, I should point out, *not* ANSI, but that misnomer's very old) is the current system code page as retrieved by [`GetACP`](https://msdn.microsoft.com/library/windows/desktop/dd318070). If writing P/Invoke code that I know to be Windows-specific, I might prefer calling to this rather than trying to figure out which piece of managed code corresponds exactly to that. If the code must be portable, I suppose all bets are off. `CultureInfo.CurrentCulture.TextInfo.ANSICodePage` *should* be fine, but I have no idea how it's implemented on Linux. – Jeroen Mostert Feb 27 '18 at 15:35

0 Answers0