1

I have the following table (expressed as a CodeFirst POCO class)

Public Class JoinTable
   Public Property ParentId as Integer
   Public Property ChildId as Integer
End Class

Now I want to formulate a requirement that for any ParentId the associated ChildIds should be unique.

So, records like
ParentId:1, ChildId: 11
ParentId:1, ChildId: 12
ParentId:2, ChildId: 11

are valid, but when I would add a record of ParentId: 2, ChildId: 12 this would be invalid as now ParentIds 1 and 2 have identical ChildIds.

How can this be done (On SQLServer LocalDb)? Can it be done using the Entity Framework?

JotaBe
  • 38,030
  • 8
  • 98
  • 117
Dabblernl
  • 15,831
  • 18
  • 96
  • 148
  • 1
    So, considering the sets of all children associated with each parent, each of those sets should be distinct? If we wanted to add `(2,12)` and `(2,13)` at the same time, that would be allowed? – Damien_The_Unbeliever Jun 17 '14 at 07:22
  • @Damien_The_Unbeliever yes it would as after this operation the ParentId of 2 would be associated with the ChildIds 11,12,13 – Dabblernl Jun 17 '14 at 07:25
  • 1
    Hmm. Thought so. It's a struggle to work out if that can even be expressed in SQL, let alone whether EF could be persuaded to create such a constraint. – Damien_The_Unbeliever Jun 17 '14 at 07:26
  • I would say its business logic time, do not believe this can be done without coding even if it gets done within a SQL trigger function. – Rand Random Jun 17 '14 at 07:53
  • Business logic: bad idea! Triggers: bad idea! Best solution: hash column with unique key. Please, see my answer. – JotaBe Jun 17 '14 at 08:47
  • Take a look at this question http://stackoverflow.com/questions/18714550/combination-of-two-field-must-be-unique-in-entity-framework-code-first-approach – Colin Jun 17 '14 at 12:07
  • Putting this another way. It looks like a child should only have one parent. Why are you modeling this as a many-to-many rather than a one-to-many? – Colin Jun 17 '14 at 12:21
  • @Colin - that was my initial thought, but notice that in their sample, `(1,11)` and `(2,11)` *are* valid. And we can even have `(2,12)`, even though there's an `(1,12)`, *provided* that one of the parents also has at least one other child that isn't shared with the other parent. – Damien_The_Unbeliever Jun 17 '14 at 14:23
  • @Damien_The_Unbeliever oh well spotted. So if this was modeled with a Collection of children on the parent then a parent would be invalid if the collection contained exactly the same children as another parent's collection.... – Colin Jun 17 '14 at 15:20
  • I've taken it as a challenge and found a way to represent such sets in a database and have the database enforce the constraints. It's significantly different from your current model and is going to be complex to operate on. But if you really, really want to model this in the database and have it enforced, it may work for you. – Damien_The_Unbeliever Jun 19 '14 at 07:29

2 Answers2

2

I've taken this as a challenge of "can such a relationship be built/enforced in the database" and finally found a way to do so. However, I'd seriously recommend not doing so - it's going to be ugly. And I'd certainly not expect Entity Framework to be able to do anything intelligent with it.

What we can do is to model the sets of children explicitly. We create a special empty set and then every other set has to derive from a smaller set by adding a new element that is larger than the current largest element in that set. By adding appropriate uniqueness constraints, we can then ensure that every set can only be derived in exactly one way - and if each set than has an optional Parent, there can only ever be one parent for a particular set.

And so here's the database table:

create table Sets (
    SetID uniqueidentifier not null
        constraint DF_Set_IDs DEFAULT (newsequentialid()),
    SubSetID uniqueidentifier null,
    ParentID int null,
    ElementValue int null,
    SubsetElement int null,
    constraint PK_Set PRIMARY KEY (SetID),
    constraint UQ_Set_XRef UNIQUE (SetID,ElementValue),
    constraint CK_Set_Empty CHECK (
        (SetID = '00000000-0000-0000-0000-000000000000'
             and ElementValue is null and SubSetID is null) or
        (SetID <> '00000000-0000-0000-0000-000000000000'
             and ElementValue is not null and SubSetID is not null)
    ),
    constraint CK_Set_LargerElement CHECK (
        SubsetElement < ElementValue
    ),
    constraint FK_Set_SubSet FOREIGN KEY (SubSetID)
         references Sets (SetID),
    constraint FK_Set_SubSet_XRef FOREIGN KEY (SubSetID,SubsetElement) 
         references Sets (SetID,ElementValue),
    constraint CK_Empty_Subsets CHECK (
        (SubSetID = '00000000-0000-0000-0000-000000000000'
             and SubsetElement is null) or
        (SubSetID <> '00000000-0000-0000-0000-000000000000'
             and SubSetElement is not null)
    ),
    constraint UQ_SubSet_And_Current UNIQUE (SubSetID,ElementValue),
    constraint FK_Set_Parent FOREIGN KEY (ParentID) references Parents (ParentID)
)

And then we also want a filtered unique constraint to ensure a parent only "owns" one set:

create unique index UQ_OwnedSetParents on Sets (ParentID)
  where ParentID is not null

So, let's consider each constraint in turn and what it brings to the party:

  • PK_Set - primary key of this table. I've made the column a uniqueidentifier primarily so that the only ints appearing in this table relate to the numbers present in sets.

  • UQ_Set_XRef - a "superkey" across both the actual primary key and the ElementValue value, allowing a foreign key to check both columns if necessary.

  • CK_Set_Empty - We want exactly one empty set, so we pick a special value for that row's SetID and then ensure that it's the only set that gets to have a NULL ElementValue and to not be based on a subset.

  • CK_Set_LargerElement - We want to ensure that as we build up sets, each set's ElementValue is larger than the largest value in the subset it's based upon. By induction, the largest element in the subset was the Subset's ElementValue, of which more follows.

  • FK_Set_SubSet - A foreign key linking this set to a subset it is based upon

  • FK_Set_SubSet_XRef - this foreign key ensures that the value we have in SubsetElement was the ElementValue for that subset row.

  • CK_Empty_Subsets - this check is needed because foreign key references aren't checked if one of the columns contains a NULL - so we have this check to backup FK_Set_SubSet_XRef and to make sure that the empty set is correctly dealt with.

  • UK_SubSet_And_Current - this ensures that for any given subset, at most one set builds on it using a particular element value - this is what finally makes each set uniquely representable

  • FK_Set_Parent - an anticlimax at this point - the set is optionally owned by a parent

  • UQ_OwnedSetParents - and this ensures that each parent only "owns" at most one parent.

As you can probably see, operating on this table is going to be somewhat complex. To enumerate the members of a set, you need to visit the specific set and the continue to access the Subset's on which each is built until you reach the empty set.

Adding a new element to a set is straightforward if it's larger than any existing element. Adding a new element that is smaller is considerably more complex, especially if some of the subsets are owned by other parents.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
1

You can

  1. enforce it in the application logic
  2. implement it with SQL Server triggers. These triggers should be implemented on the "join" table of the many-to-many relationship
  3. create an extra column with a hash of the children ids, and make it unique. For hashing you can concatenate all the children ids, always in the same order, and use any of the cryptography hashing functions, like SHA1.

The first solution sounds good. It looks like this business rule is enforced in the business layer, and that looks fine. But it's not viable if you have a lot of parents whit children. Imagine loading all the groups of children in the DB to test if the new group is repeated (whenever you add/remove a child). That's crazy.

The obvious SQL Server way is the second solution. But, if there are lot of parents, you'll have the same problem of the first solution. It will be done on the DB side but you still have to check all the groups. Besides you've hidden this rule in the DB, which can make your code base less clear.

The third solution is the best from the performance and architecture viewpoints: it offers the best performance, and the hash column on the app side doesn't hide the implementation of this rule from the code (although you can say it's not in the BL where it belongs, that's relative: if yu're using DDD this is a perfect domain model). To implement it you need to:

  • create a new column, with an unique index, to hold the hashes
  • ensure that this column is calculated when writing the entity. One way to do it is to make a read only property which returns the calculated hash from the children. This property should require the children to be present (i.e. throw an exception if the children collection is not initialized), which would help to enforce this business rule . If you make the hash column readonly it will be written to the DB, but it won't be read. That's ideal: its meaningless in the app side

If you're using EF 6.1 or later you can create the Unique Key with an attribute like this:

[Index("ChildrenHash", unique: true)]
public string ChildrenHash
{
    get { /* Calculate and return children hash / exception if missing */  }
}

In previous versions you need to implement the unique key in the DB initialization (Seed) or customizing migrations, if you're using them.

JotaBe
  • 38,030
  • 8
  • 98
  • 117
  • Thanks, that is very useful! However, Creating unique hashes is non trivial task as the ChildIds are in fact 64 bit integers. It seems to me that a *Compare HashCode -> call Equals* method is necessary. That would bring us back to bussines logic. I do not think that this has to be so very resource intensive as you can load all the Id's in memory and write back the new records after editing. – Dabblernl Jun 17 '14 at 09:08
  • I don't know how many parent with children you have on the database, but do you mean recovering all the groups, calculating the hashes of all them in memory and comparing them? Don't discard the extra column. You can still move it to he business layer if that's a concern. Create a BL method that checks if the new hash exists on the DB. That's much faster, and easy to implement. (I'm not sure I did understand your comment) – JotaBe Jun 17 '14 at 09:42