0

I have XML docs stored in a TEXT column (collation_name French_CI_AS, character_set_name iso_1).

I want to move them to a new table, in an XML column with the following SQL...

INSERT INTO Signature(JustifId, SignedJustif)
   SELECT JustifID, CONVERT(XML, Justif.SignedJustif,2)
   FROM Justif

When I do this, I get character encoding errors, that point to the high ascii character in this fragment "presentación, OU=CERES, O=FNMT-RCM, C=ES" - a spanish accented o in an X509 certificate.

This ó started life in utf8, became utf16 as a .net string, then became iso_1 when inserted into the TEXT column. I can copy and paste it into a web page no problem. How, then, do I move it from a TEXT column to an XML column in the same DB (and why is this so difficult?)?

The CONVERT idea came from this post. This MS page covers creating XML from varchar and nvarchar.

Community
  • 1
  • 1
bbsimonbb
  • 27,056
  • 15
  • 80
  • 110
  • 1
    Don't use Text columns. they are deprecated. Use varchar(max) instead. – Zohar Peled Jan 19 '17 at 11:57
  • @ZoharPeled I'm pretty sure, that exactly this is the reason for this issue. (Hopefully) OP wants to get rid of these deprecated columns and wants to store his XML in the appropriate type. – Shnugo Jan 19 '17 at 11:58
  • What is the SQL Server collation of the text column? – Dan Guzman Jan 19 '17 at 12:03
  • Nevermind. I missed it in your question. – Dan Guzman Jan 19 '17 at 12:04
  • here you have your answer using CONVERT and explanation too. [LINK](http://stackoverflow.com/questions/15953177/convert-string-into-xml-and-insert-sql-server) – M84 Jan 19 '17 at 12:34
  • @Mauro, Don't think so... Your linked question starts with *We have a SQL Server 2008 R2 database table with XML stored in a column of VARCHAR data type*. The issue there comes from an ecoding directive within the XML declaration. **In this case the XML is stored in deprecated `TEXT`** and the issue is *character encoding* on byte level... – Shnugo Jan 19 '17 at 12:36
  • @Shnugo I said that for the explanation in the answer, of course the context of the question is another, but it applies well the explanation of how the xml, varchar and encodings of these and other types work. And if you look after "solution" gives the answer as you well mention it in your answer. – M84 Jan 19 '17 at 12:55
  • 1
    @Mauro not quite... The *solution* there overcomes a completely different problem and just looks similar. If your XML starts with something like `` the written encoding cannot be overruled. Hence storing an XML with such a declaration in `NVARCHAR` would need a conversion to `VARCHAR` (and *vice versa* with `utf-16`). Sometimes it's the easiest approach just to cut off this declaration. Within **real XML** it is ommitted anyway... – Shnugo Jan 19 '17 at 13:09
  • @Shnugo It's true, thanks for the explanations. They help to understand both cases better. Thank you. – M84 Jan 19 '17 at 13:18

1 Answers1

2

This is tricky... A conversion on byte-level might lead to unexpected results...

Try this

INSERT INTO Signature(JustifId, SignedJustif)
SELECT JustifID, CONVERT(XML, CONVERT(VARCHAR(MAX),Justif.SignedJustif))
FROM Justif

If you still get issues, try to specify the specific collation together with the conversion and/or try to convert to NVARCHAR(MAX).

If this doesn't help, please edit your question and poste a (reduced) example. Best was a test-scenario with a minimal XML, where one can reproduce the error.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Use NVARCHAR(MAX) instead of VARCHAR(MAX) for the specific characters. – Igor Micev Jan 19 '17 at 12:26
  • @IgorMicev, don't know for sure. At the moment I have no possibility to test this... The source text is not `NTEXT` but `TEXT`... Not sure how smart the conversion works. Further more: If the XML stored in `TEXT` includes an XML-declaration (haven't seen this), specifying the encoding with, let's say, `utf-8`, this would not work either... But you are right, `XML` should be handled as `XML` in the first place, as `NVARCHAR(MAX)` in the second and only in rare cases as `VARCHAR(MAX)`. *Normal* special characters like the accented `ó` should work with `VARCHAR` too... – Shnugo Jan 19 '17 at 12:31
  • My docs did have an xml declaration with an explicit utf-8 encoding, and I needed to get rid of this for the above solution to work, but I'm in business now. Yahoo. – bbsimonbb Jan 19 '17 at 13:28