2

I'm trying to take a string in SQL Server that contains an accented character (the acute e specifically) and have it convert correctly when using FOR XML PATH as follows:

SELECT 'é' AS Accent
FOR XML PATH('')

/* Produces: <Accent>&eacute;</Accent> */

However, it always comes through with the accent, or tries to convert the other characters instead. This is what I've tried so far:

SELECT 'é' AS Accent
FOR XML PATH('')

/* Produces: <Accent>é</Accent> */

SELECT N'é' AS Accent
FOR XML PATH('')

/* Produces: <Accent>é</Accent> */

SELECT REPLACE('é', 'é', '&eacute;') AS Accent
FOR XML PATH('')

/* Produces: <Accent>&amp;eacute;</Accent> */

SELECT '<![CDATA[' + 'é' + ']]>' AS Accent
FOR XML PATH('')

/* Produces: <Accent>&lt;![CDATA[é]]&gt;</Accent> */

SELECT CAST('<![CDATA[' + 'é' + ']]>' AS XML) AS Accent
FOR XML PATH('')

/* Produces: <Accent>é</Accent> */

I've looked for quite a while and can't find anything apart from casting the end result XML into a string and manually replacing the character - I'm looking for a more correct way to do this. Is this just a bug in SQL Server? It seems to convert plenty of other characters (<, >, &, etc...).

Huge thanks in advance for any assistance.

I'm using SQL Server 2008.

chazbot7
  • 598
  • 3
  • 12
  • 34
  • Do I understand this correctly: There's nothing wrong with `é` actually, but - for any reason - you need this to be `é`? – Shnugo Feb 24 '17 at 17:27
  • @Shnugo correct - the XML outputted by this is imported by another system that can't read the accented character correctly without it being converted. – chazbot7 Feb 24 '17 at 17:29
  • Your first "What I tried says "Produces: é */". I'm confused. – smoore4 Feb 24 '17 at 17:29
  • @SQLDBA correct - the first query in the "What I tried" section outputs é – chazbot7 Feb 24 '17 at 17:32
  • I see. You want it to look like "é". Is the other system using BULK IMPORT? The CODEPAGE option can solve the import issue. I've run into that before. Check this out: https://msdn.microsoft.com/en-us/library/ms191184.aspx – smoore4 Feb 24 '17 at 17:37
  • See this SO question: http://stackoverflow.com/questions/13996967/special-characters-displaying-incorrectly-after-bulk-insert – smoore4 Feb 24 '17 at 17:39
  • @SQLDBA I unfortunately don't have access to the other system, but I know it's loaded into Postgres using a Java program. Thanks for the link though, I've never used BULK IMPORT before. Looks interesting. – chazbot7 Feb 24 '17 at 17:39

1 Answers1

2

I don't know for sure, but I'd assume, that this is not possible without a hack.

SQL-Server treats XML as 2 byte utf-16 internally. So all characters, which are displayable within this range do not need escaping.

It is a different thing, when it comes to unprintable characters or letters, which have a declarative meaning, like <, > and & or quotes (in some cases).

I'd probably use a cast to NVARCHAR(MAX) and then a replace on string level at the point where you export this data. You won't be able to cast this back to XML:

--Push your test XML into a variable

DECLARE @xml XML=
(
    SELECT 'é' AS Accent
    FOR XML PATH('')
);
SELECT @xml; --<Accent>é</Accent>

--Cast it to string and do the replacement on string-level

SELECT REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', '&eacute;') --<Accent>&eacute;</Accent>

--Trying to cast this result back to XML fails

SELECT CAST(REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', '&eacute;') AS XML); 

XML parsing: line 1, character 16, well formed check: undeclared entity

Obvioulsy SQL Server's XML engine doesn't even know this entity...

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Bummer. My main issue with this is the size of the XML that needs to go through this - probably best to do it outside SQL Server. Thanks a ton for your above example and explanation. If I don't get anything else I'll accept this as the answer. – chazbot7 Feb 24 '17 at 18:49