I need to select the count of unique value combinations of column B in an XRef table which is grouped by column A.
Consider the following schema and data, which represents a simple family structure. Each child has a father and mother:
TABLE Father
FatherID | Name |
---|---|
1 | Alex |
2 | Bob |
TABLE Mother
MotherID | Name |
---|---|
1 | Alice |
2 | Barbara |
TABLE Child
ChildID | FatherID | MotherID | Name |
---|---|---|---|
1 | 1 (Alex) | 1 (Alice) | Adam |
2 | 1 (Alex) | 1 (Alice) | Billy |
3 | 1 (Alex) | 2 (Barbara) | Celine |
4 | 2 (Bob) | 2 (Barbara) | Derek |
The distinct combinations of mothers for each father are:
- Alex (Alice, Barbara)
- Bob (Barbara)
In all there are two distinct combinations of mothers:
- Alice, Barbara
- Barbara
The query I want to write would return the count of those distinct combinations of mother, regardless of which father they are associated with:
UniqueMotherGroups |
---|
2 |
I was able to do this successfully using the STRING_AGG function, but it feels clunky. It also needs to operate over millions of rows and is quite slow at the moment. Is there a more idiomatic way to do this with set operations instead?
Here is my working example:
-- Drop pre-existing tables
DROP TABLE IF EXISTS dbo.Child;
DROP TABLE IF EXISTS dbo.Father;
DROP TABLE IF EXISTS dbo.Mother;
-- Create family tables.
CREATE TABLE dbo.Father
(
FatherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Father
ADD CONSTRAINT PK_Father
PRIMARY KEY CLUSTERED (FatherID);
ALTER TABLE dbo.Father SET (LOCK_ESCALATION = TABLE);
CREATE TABLE dbo.Mother
(
MotherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Mother
ADD CONSTRAINT PK_Mother
PRIMARY KEY CLUSTERED (MotherID);
ALTER TABLE dbo.Mother SET (LOCK_ESCALATION = TABLE);
CREATE TABLE dbo.Child
(
ChildID INT NOT NULL
, FatherID INT NOT NULL
, MotherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Child
ADD CONSTRAINT PK_Child
PRIMARY KEY CLUSTERED (ChildID);
CREATE NONCLUSTERED INDEX IX_Parents ON dbo.Child (FatherID, MotherID);
ALTER TABLE dbo.Child
ADD CONSTRAINT FK_Child_Father
FOREIGN KEY (FatherID)
REFERENCES dbo.Father (FatherID);
ALTER TABLE dbo.Child
ADD CONSTRAINT FK_Child_Mother
FOREIGN KEY (MotherID)
REFERENCES dbo.Mother (MotherID);
-- Insert two children with the same parents
INSERT INTO dbo.Father
(
FatherID
, Name
)
VALUES
(1, 'Alex')
, (2, 'Bob')
, (3, 'Charlie')
INSERT INTO dbo.Mother
(
MotherID
, Name
)
VALUES
(1, 'Alice')
, (2, 'Barbara');
INSERT INTO dbo.Child
(
ChildID
, FatherID
, MotherID
, Name
)
VALUES
(1, 1, 1, 'Adam')
, (2, 1, 1, 'Billy')
, (3, 1, 2, 'Celine')
, (4, 2, 2, 'Derek')
, (5, 3, 1, 'Eric');
-- CTE Gets distinct combinations of parents
WITH distinctParentCombinations (FatherID, MotherID)
AS (SELECT children.FatherID
, children.MotherID
FROM dbo.Child as children
GROUP BY children.FatherID
, children.MotherID
)
-- CTE Gets uses STRING_AGG to get unique combinations of mothers.
, motherGroups (Mothers)
AS (SELECT STRING_AGG(CONVERT(VARCHAR(MAX), distinctParentCombinations.MotherID), '-') WITHIN GROUP (ORDER BY distinctParentCombinations.MotherID) AS Mothers
FROM distinctParentCombinations
GROUP BY distinctParentCombinations.FatherID
)
-- Remove the COUNT function to see the actual combinations
SELECT COUNT(motherGroups.Mothers) AS UniqueMotherGroups
FROM motherGroups
-- Clean up the example
DROP TABLE IF EXISTS dbo.Child;
DROP TABLE IF EXISTS dbo.Father;
DROP TABLE IF EXISTS dbo.Mother;