3

Converting an XML string using CAST( AS XML) works as expected in many scenarios, but fail with an error "illegal xml character" if the string contains accented chars.

This example fails with error "XML parsing: line 2, character 8, illegal xml character":

declare @Text VARCHAR(max) = 
'<?xml version="1.0" encoding="UTF-8"?>
<ROOT>níveis porém alocação</ROOT>'

select CAST(@Text AS XML)

According to XML Specification all of them are legal XML chars, but replacing accented chars with an 'X' char will result in a sucessfull CAST:

declare @MessageText VARCHAR(max) = 
'<?xml version="1.0" encoding="UTF-8"?>
<ROOT>nXveis porXm alocaXXo</ROOT>'

select CAST(@MessageText AS XML)

Result: <ROOT>nXveis porXm alocaXXo</ROOT>

Moreover, the same XML but UTF-16 encoded, inexplicably works:

declare @MessageText NVARCHAR(max) = 
'<?xml version="1.0" encoding="UTF-16"?>
<ROOT>níveis porém alocação</ROOT>'

select CAST(@MessageText AS XML)

Result: <ROOT>níveis porém alocação</ROOT>

Are those chars illegal in UTF-8? Or there is a better way to convert into an XML datatype?

Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
m.r
  • 31
  • 2
  • In the first example, you declared Text variable, but select MessageText. If replace the variable name, it works fine on my server. – Max Zolotenko Dec 13 '19 at 19:35

1 Answers1

0

SQL Server strips any XML Declaration prolog internally for XML data type and uses UTF-16 encoding. Here is how to handle correctly your use case.

SQL

-- Method #1
DECLARE @Text NVARCHAR(MAX) = N'<ROOT>níveis porém alocação</ROOT>';
SELECT CAST(@Text AS XML);

-- Method #2
DECLARE @MessageText NVARCHAR(MAX) = 
'<?xml version="1.0" encoding="UTF-16"?>
<ROOT>níveis porém alocação</ROOT>';

SELECT CAST(@MessageText AS XML);
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • Please, may you provide any reference about "SQL Server strips any XML Declaration prolog internally"? In that case this code should work but is not: `DECLARE @MessageText NVARCHAR(MAX) = 'níveis porém alocação'; SELECT CAST(@MessageText AS XML);` – m.r Dec 16 '19 at 16:13
  • Here is a good link for your use case: https://stackoverflow.com/questions/41494856/sql-server-defining-an-xml-type-column-with-utf-8-encoding – Yitzhak Khabinsky Dec 16 '19 at 17:34
  • Another good BOL link: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/bb330962(v=sql.90)?redirectedfrom=MSDN#xml-data-type – Yitzhak Khabinsky Dec 16 '19 at 17:40
  • And one more, specially its Comments section: https://techcommunity.microsoft.com/t5/SQL-Server/Introducing-UTF-8-support-for-SQL-Server/ba-p/734928 – Yitzhak Khabinsky Dec 16 '19 at 17:59