2

I have a string of XML(utf-8).I need to store the string in the database(MS SQL). Encoding a string must be UTF-16.

This code does not work, utf16Xml is empty

XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
            { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();

utf8Xml - string contains a serialize object(encoding UTF8).

How convert xml string UTF8 to UTF16?

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
FetFrumos
  • 5,388
  • 8
  • 60
  • 98
  • 1
    In your code above, it looks like `utf8Xml` variable is of type `string`. But that would be a sick string, then, if it were not in the .NET `string` encoding (.NET uses UTF-16 internally, in case you want to know). Where did `utf8Xml` string come from? What does it look like? I think the problem is "before" the code we see. – Jeppe Stig Nielsen Jan 31 '14 at 16:07
  • utf8Xml - string contains a serialize object(encoding UTF8). I exchange data with the service encoded UTF8 – FetFrumos Jan 31 '14 at 16:21
  • `XDocument.Parse()` only accepts `string`s, which are UTF-16 in implementation. Either you've already turned into into UTF-16 in creating a `string`, or else `Parse` isn't going to work. – Jon Hanna Jan 31 '14 at 16:37
  • In .NET, a `System.String` is a sequence of UTF-16 code units. Usually, one code unit (one `System.Char` value) corresponds to one character (however two code units are required per character outside plane 0, so-called surrogate pairs). Your `utf8Xml` variable appears to be a `System.String`. Then per definition it is not UTF-8. It might be that the string `utf8Xml` is incorrectly constructed from some UTF-8 source. Then the solution is to go back and find out how that happened, and fix that. – Jeppe Stig Nielsen Feb 01 '14 at 18:53

2 Answers2

1

This might help you

MemoryStream ms = new MemoryStream();
        XmlWriterSettings xws = new XmlWriterSettings();
        xws.OmitXmlDeclaration = true;
        xws.Indent = true;
        XDocument xDoc = XDocument.Parse(utf8Xml);
        xDoc.Declaration.Encoding = "utf-16";
        using (XmlWriter xw = XmlWriter.Create(ms, xws))
        {

            xDoc.WriteTo(xw);
        }
        Encoding ut8 = Encoding.UTF8;
        Encoding ut116 = Encoding.Unicode;
        byte[] utf16XmlArray = Encoding.Convert(ut8, ut116, ms.ToArray());
        var utf16Xml = Encoding.Unicode.GetString(utf16XmlArray);
slash shogdhe
  • 3,943
  • 6
  • 27
  • 46
0

Given that XDocument.Parse only accepts a string, and that string in .NET is always UTF-16 Little Endian, it looks like you are going through a lot of steps to effectively do nothing. Either:

  1. The string – utf8Xml – is already UTF-16 LE and can be inserted into SQL Server as is (i.e. do nothing) as SqlDbType.Xml or SqlDbType.NVarChar,

    or
     

  2. utf8Xml somehow contains UTF-8 byte sequences, which would be invalid UTF-16 LE (i.e. "Unicode" in Microsoft-land) byte sequences. If this is the case, then you might be able to simply:
    1. add the XML Declaration, stating that the encoding is UTF-8:
      xDoc.Declaration.Encoding = "utf-8";
    2. do not omit the XML declaration:
      OmitXmlDeclaration = false;
    3. pass utf8Xml into SQL Server as DbType.VarChar

For further explanation, please see my answer to the related question (here on S.O.):

How to solve “unable to switch the encoding” error when inserting XML into SQL Server

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171