0

Is there any way in SQL Server / transact SQL to parse (malformed) XML that is missing the quotes that go around attributes, such as:

SELECT CAST('<test A=B />' AS XML)

The above fails with:

XML parsing: line 1, character 9, A string literal was expected

Whereas parsing the following succeeds:

SELECT CAST('<test A="B" />' AS XML)
Tom Hunter
  • 5,714
  • 10
  • 51
  • 76
  • 1
    MSSQL expects valid XML. You need to make the XML valid first. I have not seen a way to do this with pure SQL, but if you can use Java or C#... http://stackoverflow.com/questions/20125891/find-html-attributes-without-quotes-and-add-them-back-in – DaFi4 Oct 03 '16 at 11:21
  • https://github.com/MindTouch/SGMLReader – DaFi4 Oct 03 '16 at 11:31

2 Answers2

1

Your assumptions are incorrect. You can't parse XML because it's not... XML. If you read XML specification, 2.3 Common Syntactic Constructs, you'll see:

AttValue       ::=  '"' ([^<&"] | Reference)* '"'
                 |  "'" ([^<&'] | Reference)* "'"

Attributes must be quoted with " or '.

Paweł Dyl
  • 8,888
  • 1
  • 11
  • 27
1

The correct way to fix this issue is to fix the XML at its source. If, however for any reason this is not possible then you can probably fix the xml via a string splitter function and basic string manipulation. This method does assume a relatively simple xml and probably won’t work well for large or complex xml strings.

First you will have to create a string splitter function, there are plenty of example here on SO but I have included an example below for completeness:

CREATE FUNCTION [dbo].[SplitString] 
( 
    @string NVARCHAR(MAX), 
    @delimiter CHAR(1) 
) 
RETURNS @output TABLE(splitdata NVARCHAR(MAX) 
) 
BEGIN 
    DECLARE @start INT, @end INT 
    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string) 
    WHILE @start < LEN(@string) + 1 BEGIN 
        IF @end = 0  
            SET @end = LEN(@string) + 1

        INSERT INTO @output (splitdata)  
        VALUES(SUBSTRING(@string, @start, @end - @start)) 
        SET @start = @end + 1 
        SET @end = CHARINDEX(@delimiter, @string, @start)

    END 
    RETURN 
END

Next split the xml string into multiple rows for each occurrence of the assignment operator ‘=’. Then find the occurrence of the values assigned to attributes using pattern matching functions and replace values with quoted values. The final step in the query joins the split xml string back into a single xml.

DECLARE @malformedXmlString NVARCHAR(MAX) = '<test A=B width = 1000 height= 800 priority =high name="fred" />' --'<test A=BCD>DATA<\test>'

DECLARE @xmlSplit TABLE
(
     ID INT IDENTITY
    ,splitdata NVARCHAR(MAX)
)

INSERT INTO @xmlSplit
(
    splitdata
)
SELECT LTRIM(RTRIM(splitdata)) AS splitdata
FROM    [dbo].[SplitString](@malformedXmlString, '=')



UPDATE @xmlSplit
SET     splitdata = UpdatedXml.splitdata
FROM    @xmlSplit OrginalXml
INNER JOIN  (
                SELECT  ID
                         -- Use the PATINDEX function to determine the position in the string where the attribute values end. Replace value with quoted version of value. 
                        ,REPLACE(splitdata
                                ,LTRIM(RTRIM(LEFT(splitdata, PATINDEX('%[ />]%', splitdata) -1)))
                                ,'"' + LTRIM(RTRIM(LEFT(splitdata, PATINDEX('%[ />]%', splitdata) -1))) + '"') AS splitdata 
                FROM    @xmlSplit
                WHERE   splitdata LIKE '[a-zA-Z0-9]%[ />]%' -- Only return occurrences of string which start with an alpha numeric character and ends with a space, ‘/’ or ‘>’ character. This should be your value of the attribute we split the string on.
            ) UpdatedXml ON OrginalXml.ID = UpdatedXml.ID


DECLARE @xmlString NVARCHAR(MAX);

SELECT @xmlString = COALESCE(@xmlString + '=', '') + CONVERT(NVARCHAR(MAX), splitdata)
FROM @xmlSplit

SELECT CAST(@xmlString AS XML)
Edmond Quinton
  • 1,709
  • 9
  • 10