In .net I want to decode some raw data encoded by a C++ application. C++ application is 32 bit and C# application is 64bit.
C++ application supports Russian and Spanish characters, but it doesn't support unicode characters. This C# binary reader fails to read Russian or spanish characters and works only for English ascii characters.
CArchive doesn't specify any encoding and I am not sure how to read it from C#.
I've tested this for couple of simple strings this is what C++ CArchive provides :
For "ABC" : "03 41 42 43"
For "ÁåëÀÇ 7555Â" : "0B C1 E5 EB C0 C7 20 37 35 35 35 C2"
The following shows how the C++ application write the binary.
void CColumnDefArray::SerializeData(CArchive& Archive)
{
int iIndex;
int iSize;
int iTemp;
CString sTemp;
if (Archive.IsStoring())
{
Archive << m_iBaseDataCol;
Archive << m_iNPValueCol;
iSize = GetSize();
Archive << iSize;
for (iIndex = 0; iIndex < iSize; iIndex++)
{
CColumnDef& ColumnDef = ElementAt(iIndex);
Archive << (int)ColumnDef.GetColumnType();
Archive << ColumnDef.GetColumnId();
sTemp = ColumnDef.GetName();
Archive << sTemp;
}
}
}
And this is how I am trying to read it in C#.
The following can decode "ABC" but not the Russian charactors. I've tested this.Encoding
with all available options (Ascii, UTF7 and etc). Russian characters works only for Encoding.Default. But apparently that's not a reliable option as encoding and decoding usually happens in different PCs.
public override string ReadString()
{
byte blen = ReadByte();
if (blen < 0xff)
{
// *** For russian characters it comes here.***
return this.Encoding.GetString(ReadBytes(blen));
}
var slen = (ushort) ReadInt16();
if (slen == 0xfffe)
{
throw new NotSupportedException(ServerMessages.UnicodeStringsAreNotSupported());
}
if (slen < 0xffff)
{
return this.Encoding.GetString(ReadBytes(slen));
}
var ulen = (uint) ReadInt32();
if (ulen < 0xffffffff)
{
var bytes = new byte[ulen];
for (uint i = 0; i < ulen; i++)
{
bytes[i] = ReadByte();
}
return this.Encoding.GetString(bytes);
}
//// Not support for 8-byte lengths
throw new NotSupportedException(ServerMessages.EightByteLengthStringsAreNotSupported());
}
What is the correct approach to decode this? Do you think selecting the right code page is the way to solve this? If so how to know which code page was used to encode?
Appreciate if someone can show me the right direction to get this done.
Edit
I guess this Question and "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" article solve some doubts. Apparently there is no way to find the right code page for existing data.
I guess now the question is: Is there any code page that support all Spanish, Russian and English characters? Can I specify the code page in C++ CArchive class?