0

I have a set of tables (with several one-many relationships) that form a single "unit". I need to ensure that we weed out duplicates, but determining duplicates requires consideration of all the data.

To make matters worse, the DB in question is still in Sql 2000 compatibility mode, so it can't use any newer features.

Create Table UnitType
(
  Id int IDENTITY Primary Key,
  Action int not null,
  TriggerType varchar(25) not null
)

Create Table Unit
(
  Id int IDENTITY Primary Key,
  TypeId int Not Null,
  Message varchar(100),
  Constraint FK_Unit_Type Foreign Key (TypeId) References UnitType(Id)
 )

Create Table Item
(
  Id int IDENTITY Primary Key,
  QuestionId int not null,
  Sequence int not null
)

Create Table UnitCondition
(
  Id int IDENTITY Primary Key,
  UnitId int not null,
  Value varchar(10),
  ItemId int not null
  Constraint FK_UnitCondition_Unit Foreign Key (UnitId) References Unit(Id),
  Constraint FK_UnitCondition_Item Foreign Key (ItemId) References Item(Id)
)

Insert into Item (QuestionId, Sequence)
Values (1, 1),
(1, 2)

Insert into UnitType(Action, TriggerType)
Values (1, 'Changed')

Insert into Unit (TypeId, Message)
Values (1, 'Hello World'),
(1, 'Hello World')

Insert into UnitCondition(UnitId, Value, ItemId)
Values (1, 'Test', 1),
(1, 'Hello', 2),
(2, 'Test', 1),
(2, 'Hello', 2)

I've created a SqlFiddle demonstrating a simple form of this issue.

A Unit is considered a Duplicate with all (non-Id) fields on the Unit, and all conditions on that Unit combined are exactly matched in every detail. Considering it like Xml - A Unit Node (containing the Unit info, and a Conditions sub-collection) is unique if no other Unit node exists that is an exact string copy

Select
  Action, 
  TriggerType,
  U.TypeId,
  U.Message,
  (
      Select C.Value, C.ItemId, I.QuestionId, I.Sequence
      From UnitCondition C
        Inner Join Item I on C.ItemId = I.Id
      Where C.UnitId = U.Id
      For XML RAW('Condition')
  ) as Conditions
from UnitType T
  Inner Join Unit U on T.Id = U.TypeId
For XML RAW ('Unit'), ELEMENTS

But the issue I have is that I can't seem to get the XML for each Unit to appear as a new record, and I'm not sure how to compare the Unit Nodes to look for Duplicates.

How Can I run this query to determine if there are duplicate Xml Unit nodes within the collection?

Obsidian Phoenix
  • 4,083
  • 1
  • 22
  • 60
  • Can you add your desired output? – Tanner Aug 29 '14 at 14:24
  • Will do, although the output format I'm not so fussed about, so long as I can compare two units and identify uniqueness - whether that is a single string with all values, or a large table with all columns for all rows combined, I don't mind. – Obsidian Phoenix Aug 29 '14 at 14:26
  • That's fine, but based on the output your fiddle generates, I'm not sure how you are identifying uniqueness, so if you can hand craft what the output should look like, you might get more help generating that output. – Tanner Aug 29 '14 at 14:31
  • did you try this - http://stackoverflow.com/questions/10381512/what-this-query-does-to-create-comma-delimited-list-sql-server/10381975#10381975 – Bulat Aug 29 '14 at 15:18

3 Answers3

0

If you want to determine whether record is duplicate or not, you don't need to combine all values into one string. You can do this with ROW_NUMBER function like this:

SELECT  
  Action, 
  TriggerType,
  U.Id,
  U.TypeId,
  U.Message,
  C.Value,
  I.QuestionId,
  I.Sequence,
  ROW_NUMBER () OVER (PARTITION BY <LIST OF FIELD THAT SHOULD BE UNIQUE> 
                      ORDER BY <LIST OF FIELDS>) as DupeNumber
FROM UnitType T
  Inner Join Unit U on T.Id = U.TypeId
  Inner Join UnitCondition C on U.Id = C.UnitId
  Inner Join Item I on C.ItemId = I.Id;

If DupeNumber is greater than 1, then record id a duplicate.

Bulat
  • 6,869
  • 1
  • 29
  • 52
  • Thats not really going to work with a variable amount of records in Conditions though, is it? I need to know if the *whole* unit is unique, not a particular record within it. – Obsidian Phoenix Aug 29 '14 at 14:49
  • what do you mean by "variable amount of records"? you only include fields that represent the Primary Key for you in the list. – Bulat Aug 29 '14 at 14:53
  • A unit could have any number of UnitCondition records. It's the whole collection that needs to be unique: Multiple Units might have a given record thats identical, but it shouldn't be identical in every regard across all records in the Unit. – Obsidian Phoenix Aug 29 '14 at 14:55
  • So if two units are referenced by same UnitConditions they are dupes? – Bulat Aug 29 '14 at 14:57
  • No. Perhaps it's easier to think about it as Xml. I've updated my question with an XML query a `Unit` node is considered a dupe if there is any other `Unit` node that is identical in all respects. – Obsidian Phoenix Aug 29 '14 at 15:00
0

give this a try
this would find the pairs not unique
how to build that into you final answer - not sure - but possibly a start

select u1.id, u2.id 
  from unit as u1 
  join unit as u2 
    on ui.ID < u2.id 
  join UnitCondition uc1 
    on uc1.unitID = u1.ID 
  full outer join uc2
    on uc2.unitID = u2.ID  
   and uc2.itemID = uc1.itemID 
 where uc2.itemID is null or uc1.itemID is null 
paparazzo
  • 44,497
  • 23
  • 105
  • 176
0

So, I managed to figure out what I needed to do. It's a little clunky though.

First, you need to wrap the Xml Select statement in another select against the Unit table, in order to ensure that we end up with xml representing only that unit.

Select
Id,
(
  Select
    Action, 
    TriggerType,
    IU.TypeId,
    IU.Message,
    (
        Select C.Value, I.QuestionId, I.Sequence
        From UnitCondition C
          Inner Join Item I on C.ItemId = I.Id
        Where C.UnitId = IU.Id
        Order by C.Value, I.QuestionId, I.Sequence
        For XML RAW('Condition'), TYPE
    ) as Conditions
  from UnitType T
    Inner Join Unit IU on T.Id = IU.TypeId
  WHERE IU.Id = U.Id
  For XML RAW ('Unit')
)
From Unit U

Then, you can wrap this in another select, grouping the xml up by content.

Select content, count(*) as cnt
From
  (
    Select
      Id,
      (
        Select
          Action, 
          TriggerType,
          IU.TypeId,
          IU.Message,
          (
              Select C.Value, C.ItemId, I.QuestionId, I.Sequence
              From UnitCondition C
                Inner Join Item I on C.ItemId = I.Id
              Where C.UnitId = IU.Id
              Order by C.Value, I.QuestionId, I.Sequence
              For XML RAW('Condition'), TYPE
          ) as Conditions
        from UnitType T
          Inner Join Unit IU on T.Id = IU.TypeId
        WHERE IU.Id = U.Id
        For XML RAW ('Unit')
      ) as content
    From Unit U
  ) as data
group by content
having count(*) > 1

This will allow you to group entire units where the whole content is identical.

One thing to watch out for though, is that to test "uniqueness", you need to guarantee that the data on the inner Xml selection(s) is always the same. To that end, you should apply ordering on the relevant data (i.e. the data in the xml) to ensure consistency. What order you apply doesn't really matter, so long as two identical collections will output in the same order.

Obsidian Phoenix
  • 4,083
  • 1
  • 22
  • 60