0

Having difficulty processing/deserializing incoming SQS Messages in .NET Core 2.1 that contain UTF-8 chars within Lambda function. Some nodes of the messages look like this, : {'documentLibraryName': {'tr_TR': 'Belge Kitapl\\xc4\\xb1\\xc4\\x9f\\xc4\\xb1', 'th_TH': '\\xe0\\xb9\\x84\\xe0\\xb8\\xa5\\xe0\\xb8\\x9a\\xe0\\xb8\\xa3\\xe0\\xb8\\xb2\\xe0\\xb8\\xa3\\xe0\\xb8\\xb5\\xe0\\xb9\\x80\\xe0\\xb8\\xad\\xe0\\xb8\\x81\\xe0\\xb8\\xaa\\xe0\\xb8\\xb2\\xe0\\xb8\\xa3', 'bg_BG': '\\xd0\\x91\\xd0\\xb8\\xd0\\xb1\\xd0\\xbb\\xd0\\xb8\\xd0\\xbe\\xd1\\x82\\xd0\\xb5\\xd0\\xba\\xd0\\xb0 \\xd0\\xbd\\xd0\\xb0 \\xd0\\xb4\\xd0\\xbe\\xd0\\xba\\xd1\\x83\\xd0\\xbc\\xd0\\xb5\\xd0\\xbd\\xd1\\x82\\xd0\\xb8'}}

Attempting to Deserialize this message string, using JSON.NET like so var result = JsonConvert.DeserializeObject(message); is getting the following exception: Bad JSON escape sequence: \x. Path 'data.v2.documentLibraryName.tr_TR'. Attempted to replace \\ with \\\\ but getting same result. How those UTF-8 characters above can be decoded so that message can be processed successfully, but preserve data?

Thank You!

Victor
  • 13
  • 1
  • 2
  • 1
    Once I fix the property name and string value quotes (`'` should be `"`) I can parse that just fine, see https://dotnetfiddle.net/w5rBEe. There might be some confusion with c# string escaping vs JSON string escaping, so could you please share a full [mcve]? Maybe you copied the string from some Visual Studio string visualizer that added additional escaping in? – dbc Jan 22 '21 at 00:24
  • Is this a full [mcve] here? https://dotnetfiddle.net/G4Wqa1 If so then `\xc4` is a **malformed JSON Unicode escape sequence**. According to the [standard](https://www.json.org/json-en.html) an escaped unicode character should look like `\uXXXX` where the `X` characters are Hex digits. That needs to be fixed on the sending side. Maybe related: [Python encoded utf-8 string \xc4\x91 in Java](https://stackoverflow.com/q/18594177/3744182). – dbc Jan 22 '21 at 00:28
  • Thanks for replying. Sending side is in fact `Python`, that's why it's formed like this, and it's hard to fix it there for business reasons. The only difference, that messages formed like described above as they appear in `AWS Lambda Logs`, like so `\\xd0\\xbb` etc. Noticed you using string literal `@` which I cannot do as we have string variable, not constants. – Victor Jan 22 '21 at 00:54
  • @dbc Can you post an answer, so I can accept? – Victor Jan 25 '21 at 14:48

0 Answers0