3

I hope this isn't a duplicate question. I searched high and low on this site for my issue and I didn't find anything, so here's my question...

I'm using XQuery to parse out an XML document into a SQL server result set. Due to some repeating node name/value pairs, I'm getting duplicates. Why am I getting these duplicates?

Any help or assistance is greatly appreciated.

XML:

<root>
  <record>
    <recordid>1</recordid>
    <tests>
      <test name="Food" value="Apple" />
      <test name="Drink" value="Water" />
    </tests>
  </record>
  <record>
    <recordid>2</recordid>
    <tests>
      <test name="Food" value="Banana" />
      <test name="Drink" value="Orange Juice" />
    </tests>
    <tests>
      <test name="Food" value="Steak" />
      <test name="Drink" value="Beer" />
    </tests>
  </record>
</root>

SQL:

DECLARE
    @XML XML = '<root><record><recordid>1</recordid><tests><test name="Food" value="Apple" /><test name="Drink" value="Water" /></tests></record><record><recordid>2</recordid><tests><test name="Food" value="Banana" /><test name="Drink" value="Orange Juice" /></tests><tests><test name="Food" value="Steak" /><test name="Drink" value="Beer" /></tests></record></root>'

DECLARE @XMLTable AS TABLE (
    MyXML XML
)   
INSERT INTO @XMLTable(MyXML)
VALUES(@XML)

SELECT
    rec.c.value('recordid[1]', 'INT') AS RecordId,
    t.c.value('test[@name="Food"][1]/@value[1]', 'VARCHAR(15)') AS Food,
    t.c.value('test[@name="Drink"][1]/@value[1]', 'VARCHAR(15)') AS Drink
FROM @XMLTable AS x
CROSS APPLY MyXML.nodes('/root/record') rec(c)
OUTER APPLY MyXML.nodes('/root/record/tests') t(c)

Results:

RecordId    Food            Drink
----------- --------------- ---------------
1           Apple           Water
1           Banana          Orange Juice
1           Steak           Beer
2           Apple           Water
2           Banana          Orange Juice
2           Steak           Beer

Expected Results:

RecordId    Food            Drink
----------- --------------- ---------------
1           Apple           Water
2           Banana          Orange Juice
2           Steak           Beer
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
iamtheratio
  • 569
  • 4
  • 9
  • 16

3 Answers3

3

Something like this will do:

SELECT
    rec.c.value('../recordid[1]', 'INT') AS RecordId,
    rec.c.value('test[@name="Food"][1]/@value[1]', 'VARCHAR(15)') AS Food,
    rec.c.value('test[@name="Drink"][1]/@value[1]', 'VARCHAR(15)') AS Drink
FROM @XMLTable 
CROSS APPLY MyXML.nodes('/root/record/tests') rec(c)
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
  • Ah, thank you! This also works if the or nodes are empty, which is great. I also found another solution by adding a WHERE clause to my original script, I will post it below as another answer as it gives identical results to your query. I'd love to discuss which solution is better. I would think based on performance your solution looks to be better since it's only 1 CROSS APPLY. – iamtheratio May 15 '19 at 10:59
  • @if you run these queries with "Include actual execution plan" you will see that my query is 1% relative to the batch, yours is 99% – Kirill Polishchuk May 15 '19 at 23:05
  • Kirill, talking about performance: Backward navigation with `../` is a known performance killer. You might [read this for details](https://stackoverflow.com/questions/24196516/cross-apply-xml-query-performs-exponentially-worse-as-xml-document-grows/24199428#24199428). My answer provides an approach where you can avoid this... – Shnugo May 16 '19 at 08:00
2

The best way to solve this, was actually this:

SELECT
    rec.value('recordid[1]', 'INT') AS RecordId,
    t.value('test[@name="Food"][1]/@value[1]', 'VARCHAR(15)') AS Food,
    t.value('test[@name="Drink"][1]/@value[1]', 'VARCHAR(15)') AS Drink
FROM @XMLTable AS x
CROSS APPLY MyXML.nodes('/root/record') A(rec)
OUTER APPLY rec.nodes('tests') b(t);

The difference to your own solution is: The second APPLY uses the output of the first APPLY to dive deeper into the returned XML fragment.

You can read it as:

  • Dive into the XML and return each <record> in a single row.
  • Now (think it row wise) take the returned <record> and open the <tests>. This is a relativ path, hence no slashes.
  • the OUTER will allow for empty <tests>

And a general hint: Backward navigation (A Xpath with ../) is a known performance killer. There is a great answer by Mikael Eriksson explaining the background

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • I did not know this ... A Xpath with ../ ... I have some UDFs to adjust – John Cappelletti May 16 '19 at 12:52
  • @JohnCappelletti The *parent axis* can lead to very bad performance, especially with larger XMLs. I don't know if you found my link in a comment below the other answer. It's worth to read. I'll place this in my answer now... – Shnugo May 16 '19 at 15:15
  • Wow @Shnugo, GREAT info! I will test out your answer tomorrow when I’m in the office but I want you to know I really appreciate the discussion. Performance is a huge priority so thank you again for the link, answer, and extra details. – iamtheratio May 17 '19 at 00:28
  • @Shnugo, got to the office and tested this out. It works great and makes a lot of sense. I'm excited to create a large XML file and do some performance testing just to create a benchmark. Thanks again! – iamtheratio May 17 '19 at 10:59
0

I was able to achieve the expected results by adding a WHERE clause to my original script.

SQL:

SELECT
    rec.c.value('recordid[1]', 'INT') AS RecordId,
    t.c.value('test[@name="Food"][1]/@value[1]', 'VARCHAR(15)') AS Food,
    t.c.value('test[@name="Drink"][1]/@value[1]', 'VARCHAR(15)') AS Drink
FROM @XMLTable AS x
CROSS APPLY MyXML.nodes('/root/record') rec(c)
CROSS APPLY MyXML.nodes('/root/record/tests') t(c)
WHERE rec.c.value('recordid[1]', 'INT') = t.c.value('../recordid[1]', 'INT')

Results:

RecordId    Food            Drink
----------- --------------- ---------------
1           Apple           Water
2           Banana          Orange Juice
2           Steak           Beer
iamtheratio
  • 569
  • 4
  • 9
  • 16