0

The linked possible duplicate does not work for me
I got a byte array of 10 and a debug output of "b\0\0-\0\0S\0"
where the \0 is control character (I think)
and the string result is
n

I have some textual data with a lot of characters in the 128-129 range
Particularly 150 (a control character - START OF GUARDED AREA) in a spot where I know they mean em-dash
Almost positive some data got read in as win1252 and the written as unicode

I tried getting a byte array from UTF8 and it did not work
I tried getting byte array from every enconding
In between the n and S is unicode decimal 150

Below works but I have only done a very small sampling

Encoding win1252 = Encoding.GetEncoding("Windows-1252");
bool allgood = true;
List<byte> lByte = new List<byte>();
foreach (char c in @"n  S".ToCharArray())
{
    if ((Int16)c > 255)
    {
        Debug.WriteLine("problem");
        allgood = false;
        break;
    }
    else
        lByte.Add((byte)c);
}
if (allgood)
{
    s1252 = win1252.GetString(lByte.ToArray());
    Debug.WriteLine(s1252);
}

What is the proper way to convert from unicode to win1252?

this failed

string inputStr = @"n  S";
byte[] bytes = new byte[inputStr.Length * sizeof(char)];
System.Buffer.BlockCopy(inputStr.ToCharArray(), 0, bytes, 0, bytes.Length);
s1252 = win1252.GetString(bytes);
Debug.WriteLine(s1252);

There is an extra byte 194 and the result is
n – S

this failed

Debug.WriteLine("");

unicodeBytes = unicode.GetBytes(@"n  S");
foreach (byte b in unicodeBytes)
    Debug.WriteLine(b.ToString() + " ub ");
// problem is here - get some good stuff but extra 0
win1252Bytes = Encoding.Convert(unicode, win1252, unicodeBytes);
char[] win1252Chars = new char[win1252.GetCharCount(win1252Bytes, 0, win1252Bytes.Length)];
Debug.WriteLine("");
foreach (char c in unicodeChars) //win1252Chars)
    Debug.Write(c);
Debug.WriteLine(win1252Chars.ToString());
Debug.WriteLine("win1252Chars");
Community
  • 1
  • 1
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • "I tried getting a byte array from UTF8 and it did not work" what do you mean with "it did not work"? Do you have sample bytes as hexadecimals, preferably including the start of the text? – Maarten Bodewes Aug 03 '16 at 01:12
  • @MaartenBodewes not the hex but in debug it is "b\0\0-\0\0S\0" – paparazzo Aug 03 '16 at 01:22
  • @shad0wk I tired convert as I saw it in another answer but it did not work for me. I can post it if you like. It starts with I cannot get a proper arrays of byte. (byte)c of character array seems to work but does leave me with a feeling of confidence. – paparazzo Aug 03 '16 at 01:35
  • Your debug output is 8 characters. Why not try and print out the hexadecimals? That would not be too hard right? There are plenty C# encoders here on SO. – Maarten Bodewes Aug 03 '16 at 01:38
  • One of these might help http://stackoverflow.com/questions/4351985/converting-unicode-to-windows-1252-for-vcards, http://stackoverflow.com/questions/5568033/convert-a-strings-character-encoding-from-windows-1252-to-utf-8 –  Aug 03 '16 at 01:43
  • @shad0wk I already tried the first. – paparazzo Aug 03 '16 at 01:59
  • Okay what about the other one? –  Aug 03 '16 at 01:59
  • You are too quick. I think I have tried the second but will give it another try. – paparazzo Aug 03 '16 at 02:02
  • The accepted answer in the second one is interesting. @Varun0554 says that the problem is with `Encoding.GetEncoding("Windows-1252").GetBytes(string)` so there could be something wrong with `Encoding.GetEncoding("Windows-1252").GetString(bytes)`? –  Aug 03 '16 at 02:04
  • I am getting kind of frazzled so this may not make sense but in my hack s1252 = win1252.GetString(lByte.ToArray()); works! I cannot get a proper set of bytes with encoding. And I am trying to go the other direction from the link and not reading from a file. At any rate I will post my convert that did not work for me. – paparazzo Aug 03 '16 at 02:11
  • @Paparazzi I think that the windows encoding just doesn't support some Unicode characters. –  Aug 03 '16 at 02:56
  • I tested this string `(✓)/(✓)#(✓)$(✓)%(✓)^(✓)*(✓)&(✓)` and the result showed the same grouped characters, where the tick is. The actual result: `(✓)/(✓)#(✓)$(✓)%(✓)^(✓)*(✓)&(✓)` –  Aug 03 '16 at 03:02
  • @shad0wk Did you test on the string in the question? – paparazzo Aug 03 '16 at 03:12
  • Yeah, just funny charaters again. :/ –  Aug 03 '16 at 03:14

0 Answers0