Which System.Text.Encoding
is used by CharSet.Ansi
?
I want to decode a string in a .NET Core app (that was marshalled before by C++ code) without defining a structure and using Marshal.PtrToStructure
.
Encoding.GetEncoding(???).GetString(...)
In a .NET Framework application System.Text.Encoding.Default
works:
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct Structure
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 5)]
public string FieldA;
}
public class Net461App
{
static void Main(string[] args)
{
var @struct = new Structure { FieldA = "äöüß" };
byte[] buffer = ToByteArray(@struct);
var unmarshalled = ToStructure<Structure>(buffer).FieldA; // "äöüß"
Console.WriteLine(Encoding.Default.GetString(buffer).Trim('\0')); // "äöüß"
Console.WriteLine(Encoding.Default.EncodingName); // Western European (Windows)
Console.WriteLine(Encoding.Default.CodePage); // 1252
int ansiCodePage = Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage; // 1252
Encoding ansiEncoding = Encoding.GetEncoding(ansiCodePage); // works
Console.WriteLine(ansiEncoding.GetString(buffer).Trim('\0')); // "äöüß"
Console.WriteLine(ansiEncoding.EncodingName); // Western European (Windows)
Console.WriteLine(ansiEncoding.CodePage); // 1252
}
public static byte[] ToByteArray<T>(T structure) where T : struct
{
var buffer = new byte[Marshal.SizeOf(structure)];
IntPtr handle = Marshal.AllocHGlobal(buffer.Length);
try
{
Marshal.StructureToPtr(structure, handle, true);
Marshal.Copy(handle, buffer, 0, buffer.Length);
return buffer;
}
finally
{
Marshal.FreeHGlobal(handle);
}
}
public static T ToStructure<T>(byte[] buffer) where T : struct
{
IntPtr handle = Marshal.AllocHGlobal(buffer.Length);
try
{
Marshal.Copy(buffer, 0, handle, buffer.Length);
return Marshal.PtrToStructure<T>(handle);
}
finally
{
Marshal.FreeHGlobal(handle);
}
}
}
Encoding.Default
is the same as Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage
which both produce the same string as CharSet.Ansi
. But is TextInfo.ANSICodePage
always the same as CharSet.Ansi
?
Encoding.Default
is different in .NET Core and the code page 1252 is not supported:
public class NetCore2App
{
static void Main(string[] args)
{
var @struct = new Structure { FieldA = "äöüß" };
byte[] buffer = ToByteArray(@struct);
var unmarshalled = ToStructure<Structure>(buffer).FieldA; // "äöüß"
Console.WriteLine(Encoding.Default.GetString(buffer).Trim('\0')); // "????"
Console.WriteLine(Encoding.Default.EncodingName); // Unicode(UTF - 8)
Console.WriteLine(Encoding.Default.CodePage); // 65001
int ansiCodePage = Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage; // 1252
Encoding ansiEncoding = Encoding.GetEncoding(ansiCodePage); // throws "No data is available for encoding 1252."
Console.WriteLine(ansiEncoding.GetString(buffer).Trim('\0')); // ...
Console.WriteLine(ansiEncoding.EncodingName); // ...
Console.WriteLine(ansiEncoding.CodePage); // ...
}
// ...
}
Update:
Investigating the suggestion to use System.Text.Encoding.CodePages I found the following hint about getting the systems current ANSI code page:
Encoding.GetEncoding
- To get the encoding associated with the default ANSI code page in the operating system's regional and language settings, you can either supply a value 0 for the codepage argument
- If the registered provider is the CodePagesEncodingProvider, the method returns the encoding that matches the system active code page when running on the Windows operating system.
The following seems to work:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
int currentAnsiCodePage = Encoding.GetEncoding(0).CodePage;
Encoding encoding = Encoding.GetEncoding(currentAnsiCodePage);
Each of the following gives me the same code page that correctly decodes the string in my test
- Encoding.GetEncoding(0).CodePage (requires
CodePagesEncodingProvider
registration) - Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage
- CultureInfo.CurrentCulture.TextInfo.ANSICodePage
Not sure which one is preferable and works an all machines.