How to get distinct elements with all values from all arrays XML in T-SQL

Question

I need some help trying to import the following XML document

<ROOT>
<MeasuresList>
  <Measure ID="164">       
    <Names>
    <Name ID="1072687" langid="33"/>
    <Name ID="1017672" langid="13">miljoner skär</Name>
   </Names>
  </Measure>
  <Measure ID="362">       
   <Names>
    <Name ID="1072687" langid="33"/>
    <Name ID="1017672" langid="13">miljoner skär</Name>
    <Name ID="4068857" langid="19">hyller</Name>
    <Name ID="3330057" langid="18">ράφια</Name>
    <Name ID="3291105" langid="20">raflar</Name>
    <Name ID="2813622" langid="10"/>         
   </Names>       
  </Measure>
  <Measure ID="162">       
   <Names>
    <Name ID="1072687" langid="33"/>
    <Name ID="1017672" langid="13">miljoner skär</Name>
    <Name ID="1072427" langid="36"/>
    <Name ID="438237" langid="20">sayfa başına geçen dakika</Name>         
   </Names>       
  </Measure>
</MeasuresList>

Into a dbo.table (ID int PRIMARY KEY, langid int FK, Value NVARCHAR(MAX)) I am using

ALTER PROCEDURE [dbo].[AddNewMeasuresList] 
(
@XmlData XML
)

AS

DECLARE @doc int;
EXEC sp_xml_preparedocument @doc OUTPUT, @XmlData

BEGIN TRANSACTION T1
Insert Measure_Name
SELECT *
FROM OPENXML(@doc,'//ROOT/MeasuresList/Measure/Names/Name') WITH Measure_Name AS mes
WHERE NOT EXISTS
(   
    SELECT 1 FROM dbo.Measure_Name WHERE ID=mes.ID
)
COMMIT TRANSACTION T1
EXEC sp_xml_removedocument @doc

However i'm getting an exception in my c# application

System.Data.SqlClient.SqlException (0x80131904): 
Violation of PRIMARY KEY constraint 'PK_Measure_Name'. 
Cannot insert duplicate key in object 'dbo.Measure_Name'. The duplicate key value is (1072687)

The dbo.Measure_Name Table is empty but there are duplicate entries in other Measure XML nodes with the same Name ID.

Since my SELECT statement returns all "Name" which results in duplicate PRIMARY KEYS, how do i change the select statement to return all values but with only DISTINCT/UNIQUE ID's?

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Your own solution might work, but this can be done better...

FROM OPENXML together with the SPs to prepare and to remove the document is outdated and should not be used any more. Rather use the native XML support.

Try this and adapt it to your needs:

Your XML

DECLARE @xml XML=
N'<ROOT>
  <MeasuresList>
    <Measure ID="164">
      <Names>
        <Name ID="1072687" langid="33" />
        <Name ID="1017672" langid="13">miljoner skär</Name>
      </Names>
    </Measure>
    <Measure ID="362">
      <Names>
        <Name ID="1072687" langid="33" />
        <Name ID="1017672" langid="13">miljoner skär</Name>
        <Name ID="4068857" langid="19">hyller</Name>
        <Name ID="3330057" langid="18">ράφια</Name>
        <Name ID="3291105" langid="20">raflar</Name>
        <Name ID="2813622" langid="10" />
      </Names>
    </Measure>
    <Measure ID="162">
      <Names>
        <Name ID="1072687" langid="33" />
        <Name ID="1017672" langid="13">miljoner skär</Name>
        <Name ID="1072427" langid="36" />
        <Name ID="438237" langid="20">sayfa başına geçen dakika</Name>
      </Names>
    </Measure>
  </MeasuresList>
</ROOT>';

--My target table will include the @ID found in <Measure>, don't know if you need this...

DECLARE @target TABLE(MeasureID int, NameID int, [langid] int, [Value] NVARCHAR(MAX))

--The data will use the table @target as staging table
--The query uses .nodes(), APPLY and .value to read the values out of your XML

INSERT INTO @target(MeasureID,NameID,[langid],[Value]) 
SELECT m.value(N'@ID',N'int') AS MeasureID
      ,n.value(N'@ID',N'int') AS NameID
      ,n.value(N'@langid',N'int') AS [langid]
      ,n.value(N'text()[1]',N'nvarchar(max)') AS [Value]
FROM @xml.nodes(N'/ROOT/MeasuresList/Measure') AS A(m)
OUTER APPLY A.m.nodes(N'Names/Name') AS B(n);

--this returns the full (denormalized) content

SELECT * FROM @target;

After this you can use any kind of GROUP BY, aggregation, whatever you might need, to shift this into your target tables.

I'd suggest a structure with at least two related tables reflecting your 1:n related Measure/Name relation.

Some explanation

Using FROM @xml.nodes(N'/ROOT/MeasuresList/Measure') AS A(m) will result in a derived table of all (repeating) <Measure> elements. This "table" is called A, while the "column" is called m (for "measure"). This column is an XML fragment representing one single <Measure> per row.

With OUTER APPLY A.m.nodes(N'Names/Name') AS B(n) I take this derived table A, use its column m and again .nodes() to dive down to the (repeated) <Name> elements.

The actual values are taken with .value(). This method takes an XPath/XQuery in the first place and the target type in the second.

Hope this is clear now... :-D

Thanks i didn't see your answer when i last commented, not sure if it was there. I started to use the inbuilt xml, however i couldn't wrap my head around it and went back to openxml. I will give this a try over the next few days. — Eddie Ted Crocombe, Apr 17 '18 at 10:24
@EddieTedCrocombe Well, I added some explanation... You see, I'm really trying hard to convince you :-D — Shnugo, Apr 17 '18 at 10:38
Yeah that makes more sense. In value() when should the @ and [1] be used int the XPah? i notice you used it for the text() but not for the attributes? — Eddie Ted Crocombe, Apr 17 '18 at 11:22
@EddieTedCrocombe, `.value()` needs a *singleton* guarantee. While an attribute will never be found a second time (names must be unique) an element with the same name might exist. `text()` is just a special *node* below its containing element. Just imagine `text1text2`. One element may have more than one `text()`. Therefore the `[1]`. — Shnugo, Apr 17 '18 at 11:29
Cheers, you've twisted my arm, I'll be using xmlnodes from now on, thanks again for your time @Shnugo — Eddie Ted Crocombe, Apr 17 '18 at 11:32
I was lost, but now am found... [i have another if your interested...?](https://stackoverflow.com/questions/49877372/stream-a-parameter-to-t-sql-sp-from-c) @Shnugo — Eddie Ted Crocombe, Apr 17 '18 at 11:53

score 0 · Answer 2 · answered Apr 17 '18 at 08:59

Posting my found answer, I added a new WHERE Clause which compares the current ID with a list/array/table of DISTINCT ID

ALTER PROCEDURE [dbo].[AddNewMeasuresList] 
(
@XmlData XML
)

AS

DECLARE @doc int;
EXEC sp_xml_preparedocument @doc OUTPUT, @XmlData

BEGIN TRANSACTION T1
Insert Measure_Name
SELECT  *
FROM OPENXML(@doc,'//ROOT/MeasuresList/Measure/Names/Name') 
WITH 
(
    ID int,
    langid int,
    Value nvarchar(MAX) 'text()'
) AS mn
WHERE 
(
    mn.ID in (SELECT DISTINCT ID FROM OPENXML(@doc,'////ROOT/MeasuresList/Measure/Names/Name'))
)
AND NOT EXISTS
(   
    SELECT 1 FROM dbo.Measure_Name WHERE ID=mn.ID
)
COMMIT TRANSACTION T1
EXEC sp_xml_removedocument @doc

Can you offer an alternative @Shnugo? – Eddie Ted Crocombe Apr 17 '18 at 10:10 — Eddie Ted Crocombe, Apr 17 '18 at 10:10

How to get distinct elements with all values from all arrays XML in T-SQL

2 Answers2

Some explanation