0

In my database, I've created the following table:

CREATE TABLE [Category_Dim]
(
    [Id]                  INT NOT NULL PRIMARY KEY
    ,[__ParentCategoryId] INT 
    ,[Name]               VARCHAR(250)

    ,CONSTRAINT [FK1] FOREIGN KEY ([__ParentCategoryId]) REFERENCES [dbo].[Category_Dim] ([Id])
)

This allows me to store multiple different kinds of categorical (nested) lists with a root having __ParentCategoryId = NULL and, then, have children entered as follows, for example:

INSERT INTO Category_Dim (Id, __ParentCategoryId, Name) VALUES
    (1, NULL, 'Dog Breeds'),
    (2, NULL, 'Bird Types'),

    (3, 1, 'Chihuahua'),
    (4, 1, 'Pug'),
    (5, 1, 'Pit Bull'),

    (6, 2, 'Macaw'),
    (7, 2, 'Finch'),
    ... etc

In other words, in this case, Ids 3, 4 & 5 are children of 1 (different dog breeds) and 6 & 7 are children of 2 (types of birds).


Now, suppose I am trying to create a second table where I want to only allow dog breeds (children of Id = 1) as values in a column and an error otherwise.

So far, I have the following definition:

CREATE TABLE [Trainers]
(
    [TrainerId] INT NOT NULL PRIMARY KEY IDENTITY (1, 1)
    ,[__DogBreedId] INT NOT NULL
    
    , ...

    ,CONSTRAINT [FK_DogBreeds] FOREIGN KEY ([__DogBreedId]) REFERENCES [dbo].[Category_Dim] ([Id])
)

This has the foreign key constraint, but it allows any Id value from Category_Dim as my __DogBreedId, so a person could put in numbers aside from, in this case, 3-5 as I would want.

Is there a way to accomplish this through a foreign key statement? And, if not, what is the best way to do this or is this a bad idea overall?

Thanks!!

John Bustos
  • 19,036
  • 17
  • 89
  • 151
  • You can't do it with a foreig key. The other option is to implement a "pre-insert" trigger to validate the relationship, but it's a bit clunky. It begs the question, why do you want this in the first place? I haven't seen a real life app doing it. – The Impaler May 23 '21 at 01:05
  • 1
    This sounds a lot like the [one description table to rule them all](https://softwareengineering.stackexchange.com/questions/58372/is-the-one-description-table-to-rule-them-all-approach-good). It also sounds similar to EAV - the search for infinite flexibility. – SMor May 23 '21 at 02:02
  • @SMor, I get your point and I'm left wondering the same thing. I do feel this is just a single category table rather than multiple repetitive tables with simple `Id` and `Name` fields, just differing by name, but I do get the issue this creates... This is a tough one. Either way, I did come up with a solution and posted it, but still left wondering if I'm just creating an anti-pattern here. Thanks. – John Bustos May 23 '21 at 02:16
  • @TheImpaler, I updated the question to better explain the question and came up with an answer, but hoping there's a smarter way still to do this. I hope the edit explains my situation better. – John Bustos May 23 '21 at 05:21

2 Answers2

0

What I ended up doing to achieve this was I created a function that returned a BIT based upon the CategoryId provided being a child of a parent CategoryId:

CREATE FUNCTION [dbo].[IsChildOfCategory]
(
    @__CategoryId        INT
    ,@__ParentCategoryId INT
)
RETURNS BIT
AS
    BEGIN
    RETURN CASE
        WHEN EXISTS 
            (
                SELECT Id FROM Category_Dim 
                WHERE __ParentCategoryId = @__ParentCategoryId 
                AND Id = @__CategoryId
            )
            THEN 1
        ELSE 0
    END
    END;
GO

Then, I added the following check constraint to my table definition:

,CONSTRAINT [CHK_IsDogBreed] CHECK ([dbo].[IsChildOfCategory]([__DogBreedId], 1) = 1)

This, together with the Foreign Key constraint, seems to do exactly what I want it to do.

But I'm really left wondering if this is a terrible pattern to use (everything stored as data in a single category table rather than separate DB tables for each type of data), since this leaves me having to hard-code Category Ids like in this check-constraint which exist as data within the DB rather than specific DB objects (In other words, it leaves me needing to seed my DB with very specific values - In this case ensuring Category Id 1 = 'Dog Breeds').

So, it did work, but it does leave me questioning if this is a bad idea.

John Bustos
  • 19,036
  • 17
  • 89
  • 151
  • 1
    If you tree depth is greater than one, than this function doesn't work. You will have to write a CTE. – marcos May 24 '21 at 17:11
  • Also, I don't think it is a problem if you name your table `DogTrainers` and the `Id` for `Dog Breeds` doesn't change. If you are assuming that it will change, why? – marcos May 24 '21 at 17:18
  • @marcos, agreed re: the CTE - Not a big change and easy enough to implement if that happens, but I don't see it being a case in my specific situation. As for the `Id` for `Dog Breeds`, it does change ('Chihuahia', 'Beagle', etc), their **PARENT ID** would be the same (=`Dog Breed`). – John Bustos May 25 '21 at 19:26
0

For a single-level hierarchy (node depth = 1) you may be better off by using supertype-subtype.

If you do need a tree of a variable node depth then consider using a closure table, instead of having the parent_id in the same table.
Closure table stores all paths in a tree, so each ancestor-descendant link is a separate row. This way all ancestor/descendants of a given node are exposed. Closure tables are easy to query, but a bit harder to maintain, so it's a trade-off.

-- Category CAT exists.
--
category {CAT}
      PK {CAT}


-- Data Sample
  (CAT)
------------------------
  ('Dogs')
, ('Big Dogs')
, ('Small Dogs')  
, ('Chihuahua')
, ('Pug')
, ('Pit Bull')
, ('Birds')
, ('Macaw')
, ('Finch')
-- Ancestor ANC has descendant DCS
--
category_tree {ANC, DCS}
           PK {ANC, DCS}

FK1 {ANC} REFERENCES category {CAT}
FK2 {DCS} REFERENCES category {CAT}


-- Data Sample, includes ANC=DCS
   (ANC, DCS)
------------------------
  ('Dogs'       , 'Dogs')
, ('Birds'      , 'Birds')
, ('Dogs'       , 'Big Dogs')
, ('Dogs'       , 'Small Dogs')
, ('Big Dogs'   , 'Big Dogs')
, ('Small Dogs' , 'Small Dogs')
, ('Dogs'       , 'Chihuahua')
, ('Small Dogs' , 'Chihuahua')
, ('Chihuahua'  , 'Chihuahua')
, ('Dogs'       , 'Pug')
, ('Small Dogs' , 'Pug')
, ('Pug'        , 'Pug')
, ('Dogs'       , 'Pit Bull')
, ('Big Dogs'   , 'Pit Bull')
, ('Pit Bull'   , 'Pit Bull')
, ('Birds'      , 'Macaw')
, ('Macaw'      , 'Macaw')
, ('Birds'      , 'Finch')
, ('Finch'      , 'Finch')
-- Trainer TRA trains all descendants of ancestor ANC.
--
trainer {TRA, ANC}
     PK {TRA, ANC}

FK {ANC, ANC} REFERENCES category_tree {ANC, DCS}


-- Data Sample
   (TRA, ANC)
------------------------
  ('Joe'    , 'Dogs')
, ('Jane'   , 'Small Dogs')
, ('Jane'   , 'Finch')
, ('Jill'   , 'Big Dogs')
, ('Jack'   , 'Birds')
, ('John'   , 'Pug')
-- Trainer TRA trains DCS, descendant of ANC.
-- (Resolved to leaf nodes.)
WITH
q_00 AS ( -- leaves only
select ANC, count(1) as cnt 
from category_tree
group by ANC
having count(1) = 1
)
SELECT t.TRA, x.DCS, t.ANC
FROM trainer       AS t
JOIN category_tree AS x ON x.ANC = t.ANC
JOIN q_00 as q ON q.ANC = x.DCS
ORDER BY TRA, t.ANC;
;

Returns:

TRA     DCS          ANC
----------------------------------
Jack'  'Finch'      'Birds'
Jack'  'Macaw'      'Birds'
Jane'  'Finch'      'Finch'
Jane'  'Pug'        'Small Dogs'
Jane'  'Chihuahua'  'Small Dogs'
Jill'  'Pit Bull'   'Big Dogs'
Joe'   'Pit Bull'   'Dogs'
Joe'   'Pug'        'Dogs'
Joe'   'Chihuahua'  'Dogs'
John'  'Pug'        'Pug'

Note:

All attributes (columns) NOT NULL

PK = Primary Key
FK = Foreign Key


SQL to Test

CREATE TABLE category (
  CAT VARCHAR(32) NOT NULL
 
, CONSTRAINT pk_cat PRIMARY KEY (CAT)
);


CREATE TABLE category_tree (
  ANC VARCHAR(32) NOT NULL
, DCS VARCHAR(32) NOT NULL

, CONSTRAINT pk_ctre  PRIMARY KEY (ANC, DCS)

, CONSTRAINT fk1_ctre FOREIGN KEY (ANC)
              REFERENCES category (CAT)

, CONSTRAINT fk2_ctre FOREIGN KEY (DCS)
              REFERENCES category (CAT)
);


CREATE TABLE trainer (
  TRA VARCHAR(32) NOT NULL
, ANC VARCHAR(32) NOT NULL

, CONSTRAINT pk_tra PRIMARY KEY (TRA, ANC)

, CONSTRAINT fk1_tra FOREIGN KEY (ANC, ANC)
        REFERENCES category_tree (ANC, DCS)
);
INSERT INTO category (CAT)
VALUES
  ('Dogs')
, ('Big Dogs')
, ('Small Dogs')  
, ('Chihuahua')
, ('Pug')
, ('Pit Bull')
, ('Birds')
, ('Macaw')
, ('Finch')
;

INSERT INTO category_tree (ANC, DCS)
VALUES
  ('Dogs'       , 'Dogs')
, ('Birds'      , 'Birds')
, ('Dogs'       , 'Big Dogs')
, ('Dogs'       , 'Small Dogs')
, ('Big Dogs'   , 'Big Dogs')
, ('Small Dogs' , 'Small Dogs')
, ('Dogs'       , 'Chihuahua')
, ('Small Dogs' , 'Chihuahua')
, ('Chihuahua'  , 'Chihuahua')
, ('Dogs'       , 'Pug')
, ('Small Dogs' , 'Pug')
, ('Pug'        , 'Pug')
, ('Dogs'       , 'Pit Bull')
, ('Big Dogs'   , 'Pit Bull')
, ('Pit Bull'   , 'Pit Bull')
, ('Birds'      , 'Macaw')
, ('Macaw'      , 'Macaw')
, ('Birds'      , 'Finch')
, ('Finch'      , 'Finch')
;

INSERT INTO trainer (TRA, ANC)
VALUES
  ('Joe'    , 'Dogs')
, ('Jane'   , 'Small Dogs')
, ('Jane'   , 'Finch')
, ('Jill'   , 'Big Dogs')
, ('Jack'   , 'Birds')
, ('John'   , 'Pug')
;

EDIT

If a whole table should be restricted to just one ancestor then you may:

-- Trainer TRA trains dog DCS; (ANC = 'Dogs').
--
dog_trainer {TRA, DSC, ANC}
         PK {TRA, DSC}

FK {ANC, DSC} REFERENCES category_tree {ANC, DCS}

CHECK (ANC = 'Dogs')
Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
  • I'm sorry, I get the idea of the closure table, but I'm still lost as to how this helps with the criteria of having the trainer table only allow `dog` types as values to be inputted. Am I missing something? – John Bustos May 23 '21 at 20:21
  • @JohnBustos So the whole trainer table is allowed only one type? – Damir Sudarevic May 23 '21 at 20:26
  • Yes, Damir - For example, in my (stupid) example, a trainer only trains dogs, but would never train birds, so you'd want the DB to kick back an error if a bird-type was entered into the corresponding column. I do now get your structure and will give it more thought. I'm just not a fan of the upkeep relating to this (as well as a structure where the value in the entire column will always be `dogs`) over my current structure where it self-references... I do see how each has it's own advantages and disadvantages. Just have to consider which makes more sense in my current architecture. Thank you! – John Bustos May 23 '21 at 20:46
  • 1
    @JohnBustos, yes having a column with the same values is a trade-off. Same problem exist in supertype/ subtype model. The problem is lack of assertions, as noted in this Q/A: https://stackoverflow.com/questions/65024799/how-to-bind-users-to-different-organizations-universities-companies-with-diffe – Damir Sudarevic May 24 '21 at 17:33