22

Probably a duplicate of unanswered. SQL Server 2008 - Add XML Declaration to XML Output

Please let me know if this is possible. I read in some blogs

http://forums.asp.net/t/1455808.aspx/1

http://www.devnewsgroups.net/group/microsoft.public.sqlserver.xml/topic60022.aspx

But I couldn't understand why I can't do this.

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
001priyank
  • 517
  • 2
  • 5
  • 16
  • Why? It is very appropriate to defer declaring the encoding until the encoding is applied. This is typically done in an XML serializer library. Unfortunately, SQL Server doesn't have a built-in way of serializing XML but you can add one to your database (as in answers to this and other questions.) – Tom Blodget Dec 05 '18 at 18:08

4 Answers4

38

You have to add it manually. SQL Server always stores xml internally as ucs-2 so it is impossible for SQL to generate it a utf-8 encoding header

See "Limitations of the xml Data Type" on MSDN

The XML declaration PI, for example, <?xml version='1.0'?>, is not preserved when storing XML data in an xml data type instance. This is by design. The XML declaration (<?xml ... ?>) and its attributes (version/encoding/stand-alone) are lost after data is converted to type xml. The XML declaration is treated as a directive to the XML parser. The XML data is stored internally as ucs-2.

gbn
  • 422,506
  • 82
  • 585
  • 676
  • Is there a way to do this via command line? I have a giant directory of XML files that are too big to handle in most text editors. – Flater Aug 13 '13 at 14:12
  • Wow! It's been a nightmare until I found out which charset was being used. I have used to add the beginning and end of the file echo -e "\n\n$(cat products.xml)\n" > products_with_xml_root.xml – xarlymg89 Aug 01 '16 at 16:03
6

When I read this post I thought it was the "end of the line"... no solution... I almost gave up on the approach... but in fact there is a way to work around this limitation by converting the XML to varchar(max) and then appending the declaration to the beginning of the string. The following post shows how:

Using SQL Server "FOR XML": Convert Result Datatype to Text/varchar/string whatever?

A simple example would look something like this:

SELECT 'MY DATA' As MyColumn INTO #MyTable 
SELECT '<?xml version="1.0" encoding="UTF-8"?>' + 
CAST((SELECT MyColumn FROM #MyTable FOR XML PATH('')) AS VARCHAR(MAX)) AS XmlData
DROP TABLE #MyTable 

The Output:

<?xml version="1.0" encoding="UTF-8"?>
<MyColumn>MY DATA</MyColumn>
Community
  • 1
  • 1
MichaelBarce
  • 181
  • 2
  • 4
  • 2
    Simply using UTF-8 is not accurate. Use UCS-2 instead, if you have a Unicode string. See [my answer](http://stackoverflow.com/a/23207366/2266979) on a [related question](http://stackoverflow.com/q/6215069/2266979) for more details. If you have a normal varchar string, then use "windows-1252". – Riley Major Apr 21 '14 at 22:54
  • 2
    This would give data type as varchar not as xml – Mohamed Alikhan Apr 28 '16 at 11:46
6

The accepted answer of "add it manually", while technically correct, is incomplete and hence misleading. Simply adding the XML declaration with whatever "encoding" you want doesn't change the actual encoding of the string. This is sometimes ok. If you specify "UTF-8" and convert the XML data to VARCHAR, then as long as all of the characters are standard ASCII characters (values 1 - 127), then sure, it's UTF-8 (at least there is no noticeable difference). BUT, if there are any characters with values 128 or above, then you do not have a UTF-8 encoded XML document. And if you convert the XML data to NVARCHAR, then you have a UTF-16 encoded document, regardless of what you manually specify in the XML declaration. You should only be specifying an encoding IF it is the actual encoding being used.

And until SQL Server 2019 (currently in beta at CTP 2.1), there was no way to get the encoding to be UTF-8 within SQL Server, at least not without using SQLCLR. But in SQL Server 2019, you can now convert the XML to actual UTF-8:

DECLARE @XML XML;
SET @XML = N'<test attr="&#x1F60E;"/>';
SELECT @XML,
       CONVERT(VARBINARY(100), CONVERT(NVARCHAR(MAX), @XML)), -- UTF-16 / UCS-2
       CONVERT(VARBINARY(100),
               CONVERT(VARCHAR(MAX),
                       CONVERT(NVARCHAR(MAX), @XML) COLLATE Latin1_General_100_CI_AS_SC_UTF8)
              ); -- UTF-8

That returns:

Column 1: <test attr="" />
Column 2: 0x3C007400650073007400200061007400740072003D0022003DD80EDE22002F003E00
Column 3: 0x3C7465737420617474723D223F3F222F3E

Since many people won't be on SQL Server 2019 for a while yet, this is possible via SQLCLR. You can use .NET Xml classes (e.g. XmlWriter) to export this with various options. In fact, I created a SQLCLR library of functions, SQL#, that includes such a function: XML_SaveToFile. The XML_SaveToFile function allows for specifying any valid encoding and it will both set that in the XML declaration and ensure that the file is saved with that encoding. It also has options for indenting, newlines, etc. Just FYI: while there are many functions available in the Free version, XML_SaveToFile is only available in the Full (paid for) version.

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
1

I have been working with this matter during the last days, and although there might be better solutions, I have ended up quite happy with this bash script:

iconv -f UCS-2 -t UTF-8 products.xml > products_utf8.xml
echo "<?xml version='1.0'?>\n<products>\n$(cat products_utf8.xml)\n</products>" > products_utf8_final.xml

Basically, this script will get a file generated from the horrible bcp software, which generates incomplete and invalid XML data, convert it from the UCS-2 format to UTF-8 (first row), and add at the beginning and end of the file what it needs (second row of the script) to be valid and complete.

It works for me. The script I used to generate the XML file with BCP is:

bcp.exe "select * from dat1.dbo.Products FOR XML AUTO,ELEMENTS” queryout "C:\products.xml" -T -w -r -S .\SQLEXPRESS
xarlymg89
  • 2,552
  • 2
  • 27
  • 41
  • 1
    If you change your select statement to `"select * from dat1.dbo.Products FOR XML AUTO,ELEMENTS,ROOT('products')"`, then the XML generated will be valid with an encompassing outer tag. No need to add those manually, and not bcp's fault either ... – takrl Dec 12 '18 at 11:52
  • Thanks for that extra info! To be honest, I crushed my head looking for documentation, but I couldn't find what I really wanted. Thankfully, I don't have to use that anymore :) – xarlymg89 Dec 12 '18 at 13:47