1

I have a valid XML document (this has been confirmed using multiple XML validators including online validators and the Sublime Text XML validator plugin).

I receive the following error when attempting to import the XML document into MSSQL 2008 using a stored procedure named ImportNXML (command: exec [dbo].[ImportNXML];)

Msg 9420, Level 16, State 1, Line 2
XML parsing: line 17, character 35, illegal xml character

I have confirmed no illegal characters are in the XML document and line 17, character 35 is just the number 1. I've tried modifying this line, replacing the entire line with letters, replacing the entire line with a single number, padding other lines in the document before this line with letters/numbers, but i receive exactly the same error complaining about the exact same location.

If i open the ImportNXML stored procedure and run the query contents, i receive no errors at all.

What could be causing the stored procedure to fail when being executed using the 'exec' command but succeed when the procedure contents are executed as an expanded query?

Mock data for the first 17 lines is as follows:

<?xml version="1.0" ?>
<ClientData>
<Policy><policyName>The Policy Name</policyName>
<Preferences><ServerPreferences><preference><name>Sessions</name>
<value>3</value>
</preference>
<preference><name>Detection</name>
<value>yes</value>
</preference>
<preference><name>Mac</name>
<value>no</value>
</preference>
<preference><name>Plugin</name>
<value>108478;84316;32809;93635;36080;87560;61117;35292;75260;83156;61271;103773;12899;82513;56376;77796;85655;60338;56763;79951;</value>
</preference>
<preference><name>TARGET</name>
<value>123.123.123.123,234.234.234.234</value>

The portion of the stored proc that imports the XML is as follows:

EXEC(' INSERT INTO XmlImportTest(xmlFileName, xml_data) SELECT ''' + @importPath + ''', xmlData FROM ( SELECT * FROM OPENROWSET (BULK ''' + @importPath + ''' , SINGLE_BLOB) AS XMLDATA ) AS FileImport (XMLDATA) ') 
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • 1
    Run this query in SSMS `select cast('click here' as xml)`, then click the link in the results window, select all and delete, then paste your XML document into the empty window. See how SSMS parses your XML document and red-underlines what it thinks is illegal. – James L. Apr 04 '18 at 00:29
  • 1
    How exactly are you providing the XML to the proc? The example command in your question doesn't have parameters. – Dan Guzman Apr 04 '18 at 00:33
  • Is there an extended character? I can't remember exactly what extended character it was that caused me issues, but I think it was a non-breaking space (ASCII `160`, Hex `0xA0`). – James L. Apr 04 '18 at 00:33
  • You might also add a `print @xmlData` to the stored proc just before the line where it throws the error, so you can see its output and count to line 17 character 35. It may not be the same line/character as your input buffer. – James L. Apr 04 '18 at 00:41
  • Can you attach the xml? – Xedni Apr 04 '18 at 00:45
  • @JamesL. There are no red-underlines but from line 21 onwards the syntax highlighting myseriously turns off. Line 20 = plugin_set Line 21 = 108478;84316;etc... – 16b7195abb140a3929bbc322d1c6f1 Apr 04 '18 at 01:10
  • 1
    If the XML isn't confidential data, it would really help if you added it to the question (at least the first 30 lines or so...). – James L. Apr 04 '18 at 01:12
  • Unfortunately the XML data is confidential. The portion of the stored proc that imports the XML is below: `EXEC(' INSERT INTO XmlImportTest(xmlFileName, xml_data) SELECT ''' + @importPath + ''', xmlData FROM ( SELECT * FROM OPENROWSET (BULK ''' + @importPath + ''' , SINGLE_BLOB) AS XMLDATA ) AS FileImport (XMLDATA) ')` – 16b7195abb140a3929bbc322d1c6f1 Apr 04 '18 at 01:14
  • Can you reproduce the issue with mock data that isn't confidential? Either way the process of trying to do so may elucidate where it's going awry. The obvious low hanging fruit has been elimjinated and we 're probably going to have to see some sort of xml to make much progress – Xedni Apr 04 '18 at 01:20
  • @Xedni I've added mock data for the first 17 lines. The last line is Line 17. – 16b7195abb140a3929bbc322d1c6f1 Apr 04 '18 at 01:30
  • Thanks, but first, thats not a valid xml document in and of itself, so I would have to infer closing tags which may be presumptuous. Even so, when I do that, it casts as XML just fine. Can you attach xml which will *reproduce* the error? – Xedni Apr 04 '18 at 01:38
  • @Xedni The document is huge (in the MB range and has confidential material throughout) so attaching here is not an option. Also, as i mentioned, if i expand the stored procedure and then execute, there are no errors. The illegal char error only occurs when i call the stored procedure, so i believe MSSQL is doing something differently when the stored proc is executed, correct? – 16b7195abb140a3929bbc322d1c6f1 Apr 04 '18 at 01:55
  • 1
    Is it possible you've still got the data in binary? I created a dummy document consisting of `` then ran `select * from openrowset(bulk 'C:\test.xml', single_blob) a`. The result was `0x3C783E0D0A3C2F783E` Wondering if maybe it's not converting correctly. I know that doesn't quite explain various aspects of it, but it's a theory – Xedni Apr 04 '18 at 02:12
  • @Xedni this could be the issue, but what would the workaround be? – 16b7195abb140a3929bbc322d1c6f1 Apr 04 '18 at 02:34
  • You could try casting `cast(BulkColumn as xml` and see if that does the trick. Alternatively you could just cast it as varchar/nvarchar(max) so you at least have the string, and then work with it from there. Both worked for my trivial XML – Xedni Apr 04 '18 at 02:39

1 Answers1

1

Pure guessing:

  • The file is utf-8 encoded (or any other encoding, SQL-Server 2008 cannot read natively).
    • You must know, that SQL-Server is rather limited with file encodings. CHAR (or VARCHAR) is extended ASCII 1-byte encoding and NCHAR (or NVARCHAR) is UCS-2 2-byte encoding (which is almost identical with UTF-16).
    • With SQL-Server 2016 (and SP2 for v2014) some further support was introduced, especially for utf-8.
    • Try to open your XML with an appropriate editor (e.g. notepad++) and try to find out the file's encoding. Try to save this as "unicode / UCS-2 / utf-16" and retry the import.
    • Try to use your import with CLOB instead of BLOB. Reading the file as binary LargeObject will take the bytes one after the next. SQL-Server will try to read these bytes as string with fixed size per character. A character LOB might work under special circumstances.
    • Check the first two bytes for a BOM (byte order mark)
  • There is some dirt within your XML
    • Open the file with an HEX-editor and try to find strange codes
  • Your code processes the file's content within a dynamically created statement.
    • In such cases sometimes you run into truncation or string-breaking quotes
  • General hint:
    • If you import data and you expect issues it is highly recommended to use a 2-step-approach
    • Read your file into a tolerant staging table (with NVARCHAR(MAX) or even VARBIANRY(MAX) target columns) and try to continue with this.
    • It might be necessary to use another tool to change your file before the import.
Shnugo
  • 66,100
  • 9
  • 53
  • 114