0

it started differently - my JSON endpoint did not accept correctly certain data. I started looking - and turns out, if .Net string contains '\uXXXX' symbol - then it's not being understood by JSON serializer, by console window and even by debug window in Visual Studio. But if I replace this \uXXXX notation with actual symbol - then everything starts to work.

Example: \u0092 equals to - according to https://www.charbase.com/0092-unicode-private-use-two

if I run this code:

void Main()
{
    var s = "test\u0092";
    Console.WriteLine(s);
    Console.WriteLine(JsonConvert.SerializeObject(s));

    s = s.Replace('\u0092', '’');
    Console.WriteLine(s);
    Console.WriteLine(JsonConvert.SerializeObject(s));

}

the output would be

enter image description here

or if I copy/paste it here from the console, I get

test
"test"
test’
"test’"

but why output is not identical? What am I missing here?

avs099
  • 10,937
  • 6
  • 60
  • 110
  • Why do you think `\u0092` is equivalent to `’`? Your character `’` is `\u2019`. `\u0092` is an undefined glyph ([by design](https://en.wikipedia.org/wiki/Private_Use_Areas)). – Amadan Jul 25 '18 at 10:58
  • What you are seeing in your console may well depend on the font you are using. – Chris Jul 25 '18 at 11:00
  • @Amadan I googled for \u0092 and got many results indicating it's the `’` symbol?.. where should I look to see if it's undefined symbol? It's coming from mysql database if it matters (and DB Viewer shows it correctly there as well) – avs099 Jul 25 '18 at 11:01
  • @Chris I know about the fonts - but see what `JsonConvert.SerializeObject(s)` did - so it's not about it. – avs099 Jul 25 '18 at 11:02
  • 1
    @avs099: I'm not sure I see what you mean... `JsonConvert.SerializeObject` just seems to have wrapped your string in quotes, but hasn't changed it in anyway that I can see... – Chris Jul 25 '18 at 11:04
  • U+0092 is a control character, PU2 = "Private use 2". Fonts almost never have a glyph for it. An extra detail for a console window is the Console.OutputEncoding property, it translates from Unicode to the active code page of the console. You need to finish your SO profile if you need guesses to what it might be set at, but seeing no output is not remarkable. – Hans Passant Jul 25 '18 at 11:05
  • @avs099: Best place to see what a unicode character is is the unicode website. In particular 0092 appears in this code chart: https://www.unicode.org/charts/PDF/U0080.pdf . As you can see (and as Hans has said) that character is "Private Use Two" and they haven't provided a glyph for it. It is quite possible that some fonts provide a glyph for it. I assume that the reason some people represent it as they do is because the Windows-1252 charset has it as a `’` (https://en.wikipedia.org/wiki/Windows-1252). – Chris Jul 25 '18 at 11:12
  • 1
    Are you definitely using the right character encodings when reading and writing? ie is your DB definitely storing things as Unicode rather than a one byte character set or similar where a value of 92 really might mean the character you want. – Chris Jul 25 '18 at 11:13
  • 1
    @Chris: Not just 1252, but all 125X encodings. :) BTW, that's `\x92`, not `\u0092`. In UTF-8, UTF-16 and UTF-32 a single `0x92` byte is illegal; in UTF-8, `\u0092` is represented as `0xC2 0x92`; in UTF-16LE, `92 00`... it's never just `0x92` in Unicode. – Amadan Jul 25 '18 at 11:25
  • @Amadan: I'll take your word on 125X. 1252 was the default windows one for a long time which is why I singled that one out (and because its the one I knew because of how common it was). – Chris Jul 25 '18 at 11:29
  • 1
    Yeah :) I'm from Croatia, spent a lot of my youth on 1250... and then might as well check the rest. :P http://www.madore.org/~david/computers/unicode/cstab.html – Amadan Jul 25 '18 at 11:31

1 Answers1

1

Okay, issue solved. Turns out, column was in latin1_swedish_ci collation - and it uses extended ascii symbols (eg 146 for ) - which .Net converted into unicode symbol - \u0092 - but that's not a valid code. Final solution was inspired by this SO answer:

res = Encoding.GetEncoding(1252).GetString(res.Select(c => (byte) c).ToArray());
avs099
  • 10,937
  • 6
  • 60
  • 110