I have an application that gets a XML back from an API, the XML is saved in a column in a MS SQL database. This XML is used to render a report.
My problem is that MS SQL Server (2014 - 2022) seems to have problems with these special chars in XML tags. To test this simply use this:
DECLARE @xml as XML
--SET @xml = '<?xml version="1.0" encoding="UTF-8"?><test>this doesnt work É ó</test>'
SET @xml = '<?xml version="1.0" encoding="UTF-8"?><test>this works</test>'
select @xml.value('(/test)[1]','nvarchar(16)') as Test
FROM @xml.nodes('//*[text()]') AS x(y)
The first SET doesnt work - invalid xml symbol. The second SET works.
The normal .net XML stuff has no problems with this characters neither does the API which gives me this back.
Currently I string.replace
these characters in .net but there are always more new ones coming. When there is a new invalid char it breaks the process and I have to add it into the string replace logic, which means recompiling the application.
I can't whitelist a-zA-Z because I need to replace É -> E and Ð -> D and so on.
Even CDATA-encasing doesn't help.