0

I have the XML variable defined below and its value. Please help

DECLARE @xml2 as XML ;                          
SET @xml2 = '<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
  </Marks>
</Student>'

Expected result should be:

Subject  Score
-------- ------
Science  89
Maths    90
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
Vinoth
  • 19
  • 1
  • Which options from [the documentation](https://learn.microsoft.com/en-us/sql/t-sql/xml/xml-data-type-methods) have you already tried? Please include your attempt in the question. – Sander Sep 09 '20 at 19:15
  • 1
    Please restore the textual expected output instead of the image... Select your expected output in the editor and hit Ctrl+K to indent as a code/data section. – Sander Sep 09 '20 at 19:21
  • You can use [xpath](https://learn.microsoft.com/en-us/sql/relational-databases/sqlxml-annotated-xsd-schemas-xpath-queries/introduction-to-using-xpath-queries-sqlxml-4-0?view=sql-server-ver15) – Igor Sep 09 '20 at 19:29
  • Now for the question, if you extract the values, how will you link `Science` with `89` and not with, say `90`? The values/tags are not grouped under a separate node... – Sander Sep 09 '20 at 19:29
  • The tags are not under a separate node. this was an existing XML structure. I cannot changes structure. I need the result as in the screenshot. Is it possible ? Sandar – Vinoth Sep 09 '20 at 19:37
  • Does this answer your question? [Convert Xml to Table SQL Server](https://stackoverflow.com/questions/3989395/convert-xml-to-table-sql-server) – iceblade Sep 09 '20 at 19:38

3 Answers3

3

Another solution for unlimited number of pairs of <Subject> and <Score> elements.

It shows power of T-SQL and XQuery FLWOR expression.

Method #1 is a two step process:

(1) Transform XML into the following format:

<root>
  <r subject="Science" score="89" />
  <r subject="Maths" score="90" />
  ...
</root>

(2) Shred into rectangular/relational format

SQL

DECLARE @xml as XML = 
N'<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
    <Subject>History</Subject>
    <Score>100</Score>
  </Marks>
</Student>';

;WITH rs AS
(
    SELECT @xml.query('<root>
    {
        for $x in /Student/Marks/*[position() mod 2 = 1]
        let $pos := count(/Student/Marks/*[. << $x[1]]) + 1
        return <r subject="{$x/text()}" score="{/Student/Marks/*[$pos + 1]}"/>
    }
    </root>') AS xmldata
)
SELECT c.value('@subject', 'VARCHAR(30)') AS [Subject]
    , c.value('@score', 'INT') AS [Score]
FROM rs CROSS APPLY xmldata.nodes('/root/r') AS t(c);

Output

+---------+-------+
| Subject | Score |
+---------+-------+
| Science |    89 |
| Maths   |    90 |
| History |   100 |
+---------+-------+

Let's apply the same technique, but without CTE and XML transformation. It becomes much shorter and more performant.

Method #2

SELECT c.value('(./text())[1]', 'VARCHAR(30)') AS [Subject]
    , c.value('(/Student/Marks/*[sql:column("w.r")]/text())[1]', 'INT') AS [Score]
FROM @xml.nodes('/Student/Marks/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/Student/Marks/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • I upvoted this looking at the *elegance*, but I doubt the performance of `*[. << $x[1]]`. In such cases I tend to use a tally from a `count()` and grab all Elements by their position. What happens with your approach, if there might be other Elements somewhere in between? – Shnugo Sep 10 '20 at 09:04
3

And one more approach, which should be a little faster...

DECLARE @xml2 as XML ;                          
SET @xml2 = '<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
  </Marks>
</Student>';

WITH tally(Nmbr) AS(SELECT TOP(@xml2.value('count(/Student/Marks/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml2.value('(/Student/Marks/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml2.value('(/Student/Marks/Score[sql:column("tally.Nmbr")]/text())[1]','int') AS Score 
FROM tally;

The idea in short:

  • We create a tally on the fly by using a computed TOP clause together with ROW_NUMBER() against any table with a larger row count (I use master..spt_values here, best was a physical numbers table...)
  • Now we can grab each value by its position using sql:column() to get the tally's current value into the XQuery.
  • This means: We pick the first Subject with the first Score. Than the second Subject with the second score and so on...

Hint: This format is very erronous. If this is under your control you really should change it. You are relying completely on the element's order and position. A missing element or any mix-up or other elements in between could tear this down to the ground.

I'd use something like

<Student>
  <Marks Subject="Science" Score="80"/>
  <Marks Subject="Maths" Score="90"/>
</Student>

or

<Student>
  <Marks>
    <Subject name="Science">80</Subject>
    <Subject name="Maths">90</Subject>
  </Marks>
</Student>

UPDATE Benchmark

The following will compare a XML with 10 / 100 / 1000 pairs in odd/even structure:

--Make sure to use a database, where this table returns at least 1000 rows (or use any other table)

SELECT COUNT(*) FROM master..spt_values

--Filling a table with dummy data

DECLARE @tbl TABLE(ID INT IDENTITY,[Subject] VARCHAR(30),Score VARCHAR(30));
INSERT INTO @tbl 
SELECT TOP 1000 LEFT(CAST(NEWID() AS varchar(50)),30),CAST(CAST(NEWID() AS binary(4)) AS INT)
FROM master..spt_values;
SELECT * FROM @tbl;

--using three XMLs with different count of pairs

DECLARE @xml10 XML;
DECLARE @xml100 XML;
DECLARE @xml1000 XML;

SET @xml10=(
    SELECT TOP 10
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);


SET @xml100=(
    SELECT TOP 100
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);


SET @xml1000=(
    SELECT TOP 1000
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);

--test for 10

DECLARE @d DATETIME2=SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml10.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml10.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml10.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t10a
FROM tally;
SELECT 'xml10 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t10b
FROM @xml10.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml10 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

--test for 100

SET @d =SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml100.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml100.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml100.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t100a
FROM tally;
SELECT 'xml100 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t100b
FROM @xml100.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml100 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

--test for 1000

SET @d =SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml1000.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml1000.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml1000.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t1000a
FROM tally;
SELECT 'xml1000 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t1000b
FROM @xml1000.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml1000 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

Method a is my approach using a tally, method b is Yitzhak's approach using XQuery.

The difference between these two approaches is rather small

  10 Elements a=7ms     / b=6ms
 100 Elements a=83ms    / b=79ms
1000 Elements a=8942ms  / b=8721ms

Some general differences:

  • The tally-approach would work with triples or more elements per serie as well.
  • The tally approach would still work with other elements in between
  • the XQuery approach would deal better with unexpectedly missing elements, but both approaches would not return correctly, if just one expected element was missing.
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Shnugo, I completely agree with you on the input XML structure. It is very fragile. It seems like a some course exercise on how to pivot XML structure based on odd vs. even position in a sequence. – Yitzhak Khabinsky Sep 10 '20 at 12:49
  • Shnugo, I up-voted your solution too. My benchmark on the tiny input XML with three subjects provided the following outcome: my method #2 vs. your approach: Query cost (relative to the batch): 39% vs 61%, in favor of method #2 – Yitzhak Khabinsky Sep 10 '20 at 13:07
  • Shnugo, please benchmark it on your end. – Yitzhak Khabinsky Sep 10 '20 at 13:13
  • @YitzhakKhabinsky, see my update section. I had expected, that your approach would get slow with many elements, but it is almost equal... – Shnugo Sep 11 '20 at 08:33
  • Shnugo, thanks a lot for for the effort, benchmark, and great explanation. – Yitzhak Khabinsky Sep 11 '20 at 12:07
1

Without a link between the <Subject> and the <Score> tag, you could try this. The row number that is generated as a link between both tags relies on the SQL engine to return the rows in the proper order.

with cte_sub as
(
  select row_number() over(order by x.Sub) as Num,
         x.Sub.value('.', 'nvarchar(10)') as Subject
  from @xml2.nodes('/Student/Marks/Subject') as x(Sub)
),
cte_sco as
(
  select row_number() over(order by y.Sco) as Num,
         y.Sco.value('.', 'int') as Score
  from @xml2.nodes('/Student/Marks/Score') as y(Sco)
)
select c1.Subject, c2.Score
from cte_sub c1
join cte_sco c2
  on c2.Num = c1.Num;

Fiddle

Sander
  • 3,942
  • 2
  • 17
  • 22
  • Xml processing is always ordered...change each (select null) to x.sub & y.sco – lptr Sep 09 '20 at 20:12
  • @lptr, This is new information for me, got any form of official documentation for this claim? Answer updated. – Sander Sep 09 '20 at 20:19
  • 1
    http://dataeducation.com/uniquely-identifying-xml-nodes-with-dense_rank/ – lptr Sep 10 '20 at 08:26