How to extract value form XML?

Question

I have the XML variable defined below and its value. Please help

DECLARE @xml2 as XML ;                          
SET @xml2 = '<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
  </Marks>
</Student>'

Expected result should be:

Subject  Score
-------- ------
Science  89
Maths    90

Which options from [the documentation](https://learn.microsoft.com/en-us/sql/t-sql/xml/xml-data-type-methods) have you already tried? Please include your attempt in the question. — Sander, Sep 09 '20 at 19:15
Please restore the textual expected output instead of the image... Select your expected output in the editor and hit Ctrl+K to indent as a code/data section. — Sander, Sep 09 '20 at 19:21
You can use [xpath](https://learn.microsoft.com/en-us/sql/relational-databases/sqlxml-annotated-xsd-schemas-xpath-queries/introduction-to-using-xpath-queries-sqlxml-4-0?view=sql-server-ver15) — Igor, Sep 09 '20 at 19:29
Now for the question, if you extract the values, how will you link `Science` with `89` and not with, say `90`? The values/tags are not grouped under a separate node... — Sander, Sep 09 '20 at 19:29
The tags are not under a separate node. this was an existing XML structure. I cannot changes structure. I need the result as in the screenshot. Is it possible ? Sandar — Vinoth, Sep 09 '20 at 19:37
Does this answer your question? [Convert Xml to Table SQL Server](https://stackoverflow.com/questions/3989395/convert-xml-to-table-sql-server) — iceblade, Sep 09 '20 at 19:38

Yitzhak Khabinsky · Answer 1 · 2020-09-09T23:15:27.357

Another solution for unlimited number of pairs of <Subject> and <Score> elements.

It shows power of T-SQL and XQuery FLWOR expression.

Method #1 is a two step process:

(1) Transform XML into the following format:

<root>
  <r subject="Science" score="89" />
  <r subject="Maths" score="90" />
  ...
</root>

(2) Shred into rectangular/relational format

SQL

DECLARE @xml as XML = 
N'<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
    <Subject>History</Subject>
    <Score>100</Score>
  </Marks>
</Student>';

;WITH rs AS
(
    SELECT @xml.query('<root>
    {
        for $x in /Student/Marks/*[position() mod 2 = 1]
        let $pos := count(/Student/Marks/*[. << $x[1]]) + 1
        return <r subject="{$x/text()}" score="{/Student/Marks/*[$pos + 1]}"/>
    }
    </root>') AS xmldata
)
SELECT c.value('@subject', 'VARCHAR(30)') AS [Subject]
    , c.value('@score', 'INT') AS [Score]
FROM rs CROSS APPLY xmldata.nodes('/root/r') AS t(c);

Output

+---------+-------+
| Subject | Score |
+---------+-------+
| Science |    89 |
| Maths   |    90 |
| History |   100 |
+---------+-------+

Let's apply the same technique, but without CTE and XML transformation. It becomes much shorter and more performant.

Method #2

SELECT c.value('(./text())[1]', 'VARCHAR(30)') AS [Subject]
    , c.value('(/Student/Marks/*[sql:column("w.r")]/text())[1]', 'INT') AS [Score]
FROM @xml.nodes('/Student/Marks/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/Student/Marks/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;

I upvoted this looking at the *elegance*, but I doubt the performance of `*[. << $x[1]]`. In such cases I tend to use a tally from a `count()` and grab all Elements by their position. What happens with your approach, if there might be other Elements somewhere in between? — Shnugo, Sep 10 '20 at 09:04

Shnugo · Answer 2 · 2020-09-11T08:40:26.560

And one more approach, which should be a little faster...

DECLARE @xml2 as XML ;                          
SET @xml2 = '<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
  </Marks>
</Student>';

WITH tally(Nmbr) AS(SELECT TOP(@xml2.value('count(/Student/Marks/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml2.value('(/Student/Marks/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml2.value('(/Student/Marks/Score[sql:column("tally.Nmbr")]/text())[1]','int') AS Score 
FROM tally;

The idea in short:

We create a tally on the fly by using a computed TOP clause together with ROW_NUMBER() against any table with a larger row count (I use master..spt_values here, best was a physical numbers table...)
Now we can grab each value by its position using sql:column() to get the tally's current value into the XQuery.
This means: We pick the first Subject with the first Score. Than the second Subject with the second score and so on...

Hint: This format is very erronous. If this is under your control you really should change it. You are relying completely on the element's order and position. A missing element or any mix-up or other elements in between could tear this down to the ground.

I'd use something like

<Student>
  <Marks Subject="Science" Score="80"/>
  <Marks Subject="Maths" Score="90"/>
</Student>

or

<Student>
  <Marks>
    <Subject name="Science">80</Subject>
    <Subject name="Maths">90</Subject>
  </Marks>
</Student>

UPDATE Benchmark

The following will compare a XML with 10 / 100 / 1000 pairs in odd/even structure:

--Make sure to use a database, where this table returns at least 1000 rows (or use any other table)

SELECT COUNT(*) FROM master..spt_values

--Filling a table with dummy data

DECLARE @tbl TABLE(ID INT IDENTITY,[Subject] VARCHAR(30),Score VARCHAR(30));
INSERT INTO @tbl 
SELECT TOP 1000 LEFT(CAST(NEWID() AS varchar(50)),30),CAST(CAST(NEWID() AS binary(4)) AS INT)
FROM master..spt_values;
SELECT * FROM @tbl;

--using three XMLs with different count of pairs

DECLARE @xml10 XML;
DECLARE @xml100 XML;
DECLARE @xml1000 XML;

SET @xml10=(
    SELECT TOP 10
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);


SET @xml100=(
    SELECT TOP 100
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);


SET @xml1000=(
    SELECT TOP 1000
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);

--test for 10

DECLARE @d DATETIME2=SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml10.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml10.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml10.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t10a
FROM tally;
SELECT 'xml10 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t10b
FROM @xml10.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml10 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

--test for 100

SET @d =SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml100.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml100.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml100.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t100a
FROM tally;
SELECT 'xml100 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t100b
FROM @xml100.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml100 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

--test for 1000

SET @d =SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml1000.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml1000.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml1000.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t1000a
FROM tally;
SELECT 'xml1000 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t1000b
FROM @xml1000.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml1000 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

Method a is my approach using a tally, method b is Yitzhak's approach using XQuery.

The difference between these two approaches is rather small

  10 Elements a=7ms     / b=6ms
 100 Elements a=83ms    / b=79ms
1000 Elements a=8942ms  / b=8721ms

Some general differences:

The tally-approach would work with triples or more elements per serie as well.
The tally approach would still work with other elements in between
the XQuery approach would deal better with unexpectedly missing elements, but both approaches would not return correctly, if just one expected element was missing.

Shnugo, I completely agree with you on the input XML structure. It is very fragile. It seems like a some course exercise on how to pivot XML structure based on odd vs. even position in a sequence. — Yitzhak Khabinsky, Sep 10 '20 at 12:49
Shnugo, I up-voted your solution too. My benchmark on the tiny input XML with three subjects provided the following outcome: my method #2 vs. your approach: Query cost (relative to the batch): 39% vs 61%, in favor of method #2 — Yitzhak Khabinsky, Sep 10 '20 at 13:07
@YitzhakKhabinsky, see my update section. I had expected, that your approach would get slow with many elements, but it is almost equal... — Shnugo, Sep 11 '20 at 08:33
Shnugo, thanks a lot for for the effort, benchmark, and great explanation. — Yitzhak Khabinsky, Sep 11 '20 at 12:07

Sander · Answer 3 · 2020-09-09T20:18:38.143

1

Without a link between the <Subject> and the <Score> tag, you could try this. The row number that is generated as a link between both tags relies on the SQL engine to return the rows in the proper order.

with cte_sub as
(
  select row_number() over(order by x.Sub) as Num,
         x.Sub.value('.', 'nvarchar(10)') as Subject
  from @xml2.nodes('/Student/Marks/Subject') as x(Sub)
),
cte_sco as
(
  select row_number() over(order by y.Sco) as Num,
         y.Sco.value('.', 'int') as Score
  from @xml2.nodes('/Student/Marks/Score') as y(Sco)
)
select c1.Subject, c2.Score
from cte_sub c1
join cte_sco c2
  on c2.Num = c1.Num;

Fiddle

edited Sep 09 '20 at 20:18

answered Sep 09 '20 at 19:38

Sander

3,942
2
17
22

Xml processing is always ordered...change each (select null) to x.sub & y.sco – lptr Sep 09 '20 at 20:12
@lptr, This is new information for me, got any form of official documentation for this claim? Answer updated. – Sander Sep 09 '20 at 20:19
1

http://dataeducation.com/uniquely-identifying-xml-nodes-with-dense_rank/ – lptr Sep 10 '20 at 08:26

How to extract value form XML?

3 Answers3

Linked