1

For a table structure as follows:

LibraryID(INT)  XMLData(NVARCHAR(MAX))
-----------     --------------------
1               <Library xmlns:xsi="http:...
2               <Library xmlns:xsi="http:...
3               <Library xmlns:xsi="http:...

The XMLData where TableID=1 holds a values such as:

<Library xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Books>
        <LibraryBook>
            <Author>Author 1</Author>
            <Title>Title 1</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 2</Author>
            <Title>Title 2</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 3</Author>
            <Title>Title 3</Title>
        </LibraryBook>
    </Books>
    <Magazines>
    ...
    </Magazines>
</Library>  

I would like the output to be:

LibraryID(INT)  Author      Title
-----------     ---------   -------
1               Author 1    Title 1
1               Author 2    Title 2
1               Author 3    Title 3
2               ...         ...
3               ...         ...  
3               ...         ...

I have tried the query below:

;WITH XmlData AS
(
    SELECT
        LibraryID,
        XmlNodes = CAST(XmlData AS XML)
    FROM
        Library
)
,BrokenDown AS
(
    SELECT 
        LibraryID, 
        Author = XmlNodes.value('(/Library/Books/LibraryBook/Author)[1]', 'VARCHAR(100)'),
        Title = XmlNodes.value('(/Library/Books/LibraryBook/Title)[1]', 'VARCHAR(100)')
    FROM 
        XmlData
)

SELECT * FROM BrokenDown

The output only lists the first book and title per LibraryID:

LibraryID    Author      Title
---------    ------      -----
1            Author 1    Title 1
2            ...         ...
3            ...         ...

Any help would be greatly appreciated.

Ross Bush
  • 14,648
  • 2
  • 32
  • 55

2 Answers2

2

Something like this:

use tempdb
go
drop table if exists Library
go

create table Library(LibraryId int identity primary key, XmlData xml)

go

insert into Library(XmlData)
values 
('<Library xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Books>
        <LibraryBook>
            <Author>Author 1</Author>
            <Title>Title 1</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 2</Author>
            <Title>Title 2</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 3</Author>
            <Title>Title 3</Title>
        </LibraryBook>
    </Books>
    <Magazines>
    ...
    </Magazines>
</Library>  '
),
('<Library xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Books>
        <LibraryBook>
            <Author>Author 1</Author>
            <Title>Title 1</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 2</Author>
            <Title>Title 2</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 3</Author>
            <Title>Title 3</Title>
        </LibraryBook>
    </Books>
    <Magazines>
    ...
    </Magazines>
</Library>  '
),
('<Library xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Books>
        <LibraryBook>
            <Author>Author 1</Author>
            <Title>Title 1</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 2</Author>
            <Title>Title 2</Title>
        </LibraryBook>
        <LibraryBook>
            <Author>Author 3</Author>
            <Title>Title 3</Title>
        </LibraryBook>
    </Books>
    <Magazines>
    ...
    </Magazines>
</Library>  '
)

select l.LibraryId, 
       book.value('(Author)[1]','varchar(20)') Author, 
       book.value('(Title)[1]','varchar(20)') Title 
from Library l
outer apply l.XmlData.nodes('Library/Books/LibraryBook') books(book)

outputs

LibraryId   Author               Title
----------- -------------------- --------------------
1           Author 1             Title 1
1           Author 2             Title 2
1           Author 3             Title 3
2           Author 1             Title 1
2           Author 2             Title 2
2           Author 3             Title 3
3           Author 1             Title 1
3           Author 2             Title 2
3           Author 3             Title 3

(9 rows affected)
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • 1
    hehe - you've "silently" made all the adaptations I did, too - datatype `xml`, fixing non-matching tags etc. ;-) – marc_s Jul 09 '20 at 16:41
  • 1
    Thanks the OUTER APPLY worked. When I get multiple great answers, I always mark the first one as answered. Thanks for the help! – Ross Bush Jul 09 '20 at 16:52
2

Once you've "cleansed" your XML (the end tags </FieldName> and </DisplayIndex> don't match the start tags <Author> and <Title> ....) - and once you've defined your column as XML (since this contains nothing but XML - why isn't it declared as XML in the first place???) - you can try this:

SELECT
    LibraryID,
    Author = XC.value('(Author/text())[1]', 'varchar(50)'),
    Title = XC.value('(Title/text())[1]', 'varchar(50)')
FROM
    dbo.Library
CROSS APPLY
    XmlData.nodes('/Library/Books/LibraryBook') AS XT(XC)

If you must keep the not very useful NVARCHAR(MAX) datatype - then you need that "convert to XML" CTE ahead of your SELECT like this:

;WITH XmlCte AS 
(
    SELECT 
        LibraryId, 
        RealXmlData = CAST(XmlData AS XML)
    FROM dbo.Library
)
SELECT
    LibraryId,
    Author = XC.value('(Author/text())[1]', 'varchar(50)'),
    Title = XC.value('(Title/text())[1]', 'varchar(50)')
FROM
    RealXmlData
CROSS APPLY
    XmlData.nodes('/Library/Books/LibraryBook') AS XT(XC)

Update: added the /text() expressions to both XML elements being selected in the XQuery - thanks to @YitzhakKhabinsky for this tip - speeds up the query quite significantly!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • I tried to simplify my real world problem using a elementary concept, those were typos from that effort. I don't know why NVARCHAR was chosen, the system is legacy. – Ross Bush Jul 09 '20 at 16:44
  • Marc, please connect with me on LInkedIn. I sent you a connect invite some time ago. – Yitzhak Khabinsky Jul 09 '20 at 17:13
  • @YitzhakKhabinsky: those "corrections" of yours really aren't necessary, nor helpful - this code as shown here works **just fine** - what's the point of adding `/text()` to every XML element name in a `.value()` function call ??? – marc_s Jul 09 '20 at 18:15
  • @marc_s. Just run a simple test while **Actual Execution Plan** is On. Two SELECTs in one batch together, one without the `/text()` and one with the `/text()`. You will see the difference. – Yitzhak Khabinsky Jul 09 '20 at 21:03
  • @marc_s. **Query cost (relative to the batch)** will show you the difference. – Yitzhak Khabinsky Jul 09 '20 at 21:21
  • @YitzhakKhabinsky: any explanations / reference (blog post etc.) as to *why* adding `text()` makes the query this much faster? – marc_s Jul 10 '20 at 03:50
  • 2
    @marc_s Michael Rys talks about it [here](https://channel9.msdn.com/events/TechEd/2012/DBI404?term=michael%20rys&lang-en=true). The entire presentation is very good but the part of using `text()` is from 34:00 to 39:00. It should be used for untyped XML when you know you are not dealing with mixed content XML. Without the text() you will get the concatenation of all text values of all subnodes and if you know you only have one value in a node that extra part of the execution plan that does the concatenation is unnecessary. – Mikael Eriksson Jul 10 '20 at 06:47
  • 1
    I did a [QOTD](https://www.sqlservercentral.com/questions/value-method-xml-data-type) on SQL Server Central once that showed some of the behavior. – Mikael Eriksson Jul 10 '20 at 06:47
  • @YitzhakKhabinsky: I don't see any Linkedin invite requests - can you send it again? – marc_s Jul 10 '20 at 06:59
  • 1
    @marc_s It's not only about performance... You might read [this](https://stackoverflow.com/a/50170995/5089204) for some details. You can try `SELECT CAST('text1inner bmore text' AS XML).value('(/a)[1]','varchar(100)')` and now try `.value('(/a/text())[1]','varchar(100)')` or `.value('(/a/text()[2])[1]','varchar(100)')`. And - if interested - [this](https://stackoverflow.com/a/43242238/5089204) about performance. – Shnugo Jul 10 '20 at 09:52
  • @MikaelEriksson, the links I just placed in a comment above might be interesting too. – Shnugo Jul 10 '20 at 09:57
  • 2
    @Shnugo, Mikael Eriksson, Marc. Thank you all. Peace is restored. – Yitzhak Khabinsky Jul 10 '20 at 12:19
  • @marc_s. Here is my LinkedIn profile: https://www.linkedin.com/in/yitzhakkhabinsky/ – Yitzhak Khabinsky Jul 10 '20 at 12:41