0

I have issue with SQL Server and JSON :/

There are a lot of JSON strings coming from a remote database (which I only have read permission). The problem is that all of them are parsed into unicodes and I don't know how to turn them into UTF-8 !

Take a look at this JSON for example:

{
   "0":{
      "id":1,
      "name":"\\u062a\\u0646"
   },
   "1":{
      "id":2,
      "name":"\\u0628\\u0633\\u062a\\u0647 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "2":{
      "id":3,
      "name":"\\u0639\\u062f\\u062f 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "5":{
      "id":6,
      "name":"\\u0639\\u062f\\u062f 1000\\u062a\\u0627\\u06cc\\u06cc"
   },
   "6":{
      "id":7,
      "name":"\\u06a9\\u06cc\\u0644\\u0648\\u06af\\u0631\\u0645"
   }
}

As you can see field name is parsed into unicode. I did this to extract data into a table:

SELECT Units.*
FROM OPENJSON(@Json) AS i
CROSS APPLY OPENJSON(i.[value]) WITH (
   [Id] INT '$.id',
   [Name] NVARCHAR(MAX) '$.name'
) AS Units

It will then show something like this as result:

  • Id: 1, Name: \u062a\u0646
  • Id: 2, Name: \u0628\u0633\u062a\u0647 10\u062a\u0627\u06cc\u06cc
  • ...

But I need something like this as result:

  • Id: 1, Name: تن
  • Id: 2, Name: عدد 10تایی
  • ...
Zhorov
  • 28,486
  • 6
  • 27
  • 52
Farhad Rad
  • 563
  • 2
  • 15
  • 3
    Why not `... FROM OPENJSON(REPLACE(@Json, N'\\', N'\')) AS i`? More information [here](https://www.json.org/json-en.html). – Zhorov Aug 13 '21 at 12:28
  • Thank you @Zhorov it solved my problem :) – Farhad Rad Aug 13 '21 at 12:31
  • Not sure if this can help, but prefixing with 'N' signifies string contains unicode. https://stackoverflow.com/questions/46842044/openjson-unable-to-parse-chinese-characters – Phaelax z Aug 13 '21 at 12:32
  • What you posted are escape sequences, not Unicode strings. This page is a Unicode string. Your question is Unicode. `nvarchar` fields are Unicode. What is `@json` though and what does it contain? Why does `json` contain those escape sequences instead of actual Unicode characters like `تن` ? – Panagiotis Kanavos Aug 13 '21 at 12:39
  • BTW escape sequences have only a single backslash. If `@json` contains `\\u0628` instead of `\u0628` it's invalid JSON. Most likely the code that generated the string had a bug and doubled the backslashes – Panagiotis Kanavos Aug 13 '21 at 12:55

1 Answers1

0

Some explanations from ECMA-404 standard: A JSON string is a sequence of Unicode code points wrapped with quotation marks (U+0022). ... If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point.

It seems, that the names are escaped, so a possible option is to replace the escaped \:

JSON:

DECLARE @json nvarchar(max) = N'{
   "0":{
      "id":1,
      "name":"\\u062a\\u0646"
   },
   "1":{
      "id":2,
      "name":"\\u0628\\u0633\\u062a\\u0647 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "2":{
      "id":3,
      "name":"\\u0639\\u062f\\u062f 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "5":{
      "id":6,
      "name":"\\u0639\\u062f\\u062f 1000\\u062a\\u0627\\u06cc\\u06cc"
   },
   "6":{
      "id":7,
      "name":"\\u06a9\\u06cc\\u0644\\u0648\\u06af\\u0631\\u0645"
   }
}'

Statement:

SELECT Units.*
FROM OPENJSON(REPLACE(@Json, N'\\u', N'\u')) AS i
CROSS APPLY OPENJSON(i.[value]) WITH (
   [Id] INT '$.id',
   [Name] NVARCHAR(MAX) '$.name'
) AS Units

Result:

Id  Name
1   تن
2   بسته 10تایی
3   عدد 10تایی
6   عدد 1000تایی
7   کیلوگرم
Zhorov
  • 28,486
  • 6
  • 27
  • 52
  • Even so, the string should contain a single backslash, not two. `\\u0628` is invalid. The correct escape sequnce is `\u0628` – Panagiotis Kanavos Aug 13 '21 at 12:53
  • @PanagiotisKanavos It will so in the end, but since it's a literal residing in a different context a double backslash needs to be used to effectively print one backslash. Nested strings are like nested loops: logical. – AmigoJack Aug 13 '21 at 14:20
  • @AmigoJack no it's not. That's not how escaping works. That's why *removing* the double backslashes works. Whatever code produced this string has a serious bug. Try any online validator with a string with *single* backslashes. You'll see it works just fine. In fact, there's no reason to use escape sequences at all. Entering `عدد 10تایی` in the attribute value works just fine – Panagiotis Kanavos Aug 13 '21 at 15:36
  • Now I understood Q and A: it's not about UTF-8 or Unicode or JSON - it's about the mysterious source that "has" double backslashes. But there's no indication it's actually crafted like this or just mis-consumed already. In most SQL dialects a backslash in string literals has a special meaning - that's why I would consider `N'{ "\\u062a" ... }'` to be the proper SQL for an outcome of proper JSON. – AmigoJack Aug 13 '21 at 16:00