0

We have a SQL server running on Azure RDS, We have a problem where a stored procedure is doing some ETL work parsing JSON value, ultimately I tracked down the problem to this minimal reproducible scenario:

CREATE TABLE [dbo].[conditions_test] (
    [condition_notation] nvarchar(max),
    [condition_id] nvarchar(max)
);

insert into conditions_test (condition_id, condition_notation) values('25271','≤');

The problem is that this character, gets converted into =and results in:

condition_notation condition_id
= 25271

I've tried changing the schema collation using alter table column collate Latin1_General_100_CI_AI_SC and no change.

The source of the JSON is actually from another DB running MySQL and the CHARSET function on the original JSON shows utf8mb4 - this data is copied over the the MS SQL DB for further ETL work.

Why is this happening and how to fix it?

Carmageddon
  • 2,627
  • 4
  • 36
  • 56
  • 6
    The Unicode character will be preserved if you use `values(N'25271',N'≤')` – Stu Apr 18 '23 at 18:41
  • 1
    Even [simpler repro](https://dbfiddle.uk/TU__vVKh): `DECLARE @x nvarchar(max) = '≤'; SELECT @x;` – Aaron Bertrand Apr 18 '23 at 19:31
  • 2
    Or `SELECT '≤', N'≤';` – Aaron Bertrand Apr 18 '23 at 19:38
  • 1
    This isn't anything to do with JSON, it's how you've written your T-SQL code. `'≤'` is a character literal interpreted according to the current database's default collation. `N'≤'` is a National character literal and is treated as a Unicode code point. Whenever you deal with the `nchar` or `nvarchar` data types, or the long-deprecated `ntext` data type, you need to use National character literals with the `N` prefix or risk character loss or corruption as you've experienced here. – AlwaysLearning Apr 18 '23 at 21:54

0 Answers0