6

Good Day, I would like to implement a T-SQL query for the Set Cover Problem but have been unable to find any hints on how to do this in SQL.

In my case, my table just has two columns (IDnumber and Mut) and I want to find the minimum number of IDNumber to get one of every Mut. I really want to obtain three IDnumbers for every Mut but I figured I better start off with just one because that might be easier.

DECLARE @myTable TABLE (
    IDnumber int,
    Mut varchar(1))

INSERT @myTable VALUES 
 (1,'C'), (1,'N'), (1,'Z'), (1,'M'), (1,'E'), (2,'E'), (3,'B'), (3,'N'), (3,'D'), (3,'K'), 
(3,'W'), (4,'O'), (4,'G'), (4,'N'), (4,'B'), (4,'U'), (4,'C'), (5,'Q'), (5,'H'), (6,'K'), 
(6,'Y'), (6,'M'), (6,'A'), (6,'O'), (6,'U'), (6,'J'), (7,'H'), (7,'U'), (7,'M'), (7,'L'), 
(8,'B'), (8,'K'), (8,'P'), (9,'Y'), (9,'K'), (10,'Z'), (11,'R'), (12,'X'), (12,'R'), 
(12,'O'), (12,'Z'), (4,'C'), (1,'Z'), (4,'S'), (6,'E'), (5,'G'), (4,'C'), (4,'S'), (4,'H'), 
(6,'D'), (7,'W'), (3,'U'), (6,'N'), (7,'Y'), (6,'N'), (6,'F'), (4,'C'), (4,'I'), (7,'P'), 
(10,'H'), (10,'Z'), (10,'S'), (7,'Z'), (6,'B'), (7,'Z'), (8,'X'), (8,'J'), (8,'P'), (10,'K'), 
(8,'K'), (12,'P'), (8,'W'), (10,'M'), (12,'F'), (9,'T'), (9,'D'), (14,'Y'), (12,'P'), 
(14,'J'), (13,'D'), (15,'H'), (12,'J'), (6,'H'), (2,'Z'), (8,'G'), (10,'Q'), (6,'D'), 
(5,'X'), (9,'T'), (6,'W'), (6,'K'), (10,'W'), (7,'J'), (11,'W'), (12,'V'), (9,'F'), (7,'F'), 
(4,'M'), (5,'K'), (12,'G'), (12,'T'), (15,'T'), (13,'W'), (7,'J'), (9,'T'), (10,'U'), (9,'S'), 
(10,'L'), (10,'J'), (10,'H'), (11,'H'), (12,'S'), (12,'A'), (14,'L'), (13,'K'), (13,'D'), 
(4,'M'), (3,'N'), (4,'F'), (7,'M'), (7,'V'), (5,'R'), (4,'K'), (5,'F'), (7,'G'), (8,'M'), 
(4,'X'), (7,'F'), (9,'S'), (7,'N'), (6,'W'), (6,'W'), (5,'S'), (9,'Z'), (10,'I'), (11,'Y'), 
(11,'D'), (9,'X'), (7,'G'), (9,'S'), (9,'H'), (9,'T'), (8,'J'), (10,'U'), (9,'F'), (9,'S'), 
(7,'D'), (14,'R'), (10,'F'), (7,'E'), (15,'M'), (12,'F'), (5,'C'), (8,'E'), (16,'G'), (11,'V'),
(10,'I'), (12,'I'), (11,'Y'), (12,'I'), (14,'J'), (15,'D'), (19,'J'), (16,'B'), (12,'G'), 
(9,'J'), (18,'J'), (18,'C'), (16,'Q'), (18,'P'), (13,'F'), (19,'T'), (15,'J'), (15,'R'), 
(15,'Q'), (15,'O'), (11,'A'), (24,'B'), (19,'S'), (22,'I'), (15,'X'), (20,'T'), (15,'E'), 
(9,'V'), (8,'H'), (16,'N'), (17,'H')
--  Since the above list was generated by a bunch of random numbers/letters I need to
-- delete the duplicates

;WITH cte AS (
  SELECT IDnumber, mut, 
     row_number() OVER(PARTITION BY IDNumber, mut ORDER BY IDNumber) AS [rn]
  FROM @myTable
)
DELETE cte WHERE [rn] > 1


SELECT *
FROM ( SELECT IDnumber, Mut FROM @myTable) AS S
PIVOT
(COUNT(Mut) FOR mut IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L],[M],[N],[O],[P],
[Q],[R],[S],[T],[U],[V],[W],[X],[Y],[Z])) AS pvt

So you can see from the pivot table that the minimum IDnumbers would be 3,5,7, and 12.

How would one go about implementing the algorithm? It seems to me that I could find all the combinations (2^6) which would be sets and then determine which sets have all the Muts. The set with the smallest number of IDnumbers is then the minimal set.

That kind of brute force may work but it would be horribly inefficient. My real world case is not enormous, I have 43 unique Muts (not the nine in the example) and ~2000 IDnumbers, but I am thinking that would take some time to run because 2^2000 is pretty darn big...

Thanks!

user918967
  • 2,049
  • 4
  • 28
  • 43
  • can't you show output you are expecting – KumarHarsh Oct 28 '16 at 04:19
  • 1
    This question poses [similarities](http://stackoverflow.com/questions/25334417/minimal-number-of-groups-necessary-to-cover-user-product-permissions) but no answer. Also [this question did receive answers](http://stackoverflow.com/questions/28202429/dynamic-t-sql-approach-for-combinatorics-knapsack) which you may be able to alter to your requirement – M O'Connell Oct 28 '16 at 05:23
  • Can you provide the input for all of your data? This is an intriguing question I'd like to dig into and having more data would be helpful. – Eric J. Price Oct 28 '16 at 07:26
  • I'm not sure you appreciate that that there is no (known) way of escaping an exponential-time algorithm here. Set Cover is NP-complete; the best you can hope for is to reduce the base down from 2 (difficult; probably requires thorough knowlege of CS theory) or the constant factor (usually less difficult). – j_random_hacker Oct 28 '16 at 11:07
  • I added more data to the question. – user918967 Oct 28 '16 at 15:00
  • 1
    @user918967 - I think you need to update the minimum list of IDNumbers for your new result set – Ed Harper Oct 28 '16 at 15:18
  • @user918967 I came up with 3,5,7,12 with the new data. – Eric J. Price Oct 28 '16 at 17:11

2 Answers2

2

I would like a larger dataset to test on, but this matches your results for at least the dataset given...

DECLARE @myTable TABLE (
        IDnumber INT,
    Mut VARCHAR(1))

DECLARE @results TABLE (
    IDnumber INT)

INSERT @myTable VALUES 
    (1,'A'),(1,'B'),(1,'C'),(1,'D'),
    (2,'A'),(2,'C'),(2,'D'),
    (3,'A'),(3,'B'),(3,'F'),(3,'Z'),
    (4,'A'),(4,'B'),(4,'E'),(4,'F'),
    (5,'Y'),
    (6,'X'),(6,'Z')

DECLARE @IDnumber INT

WHILE EXISTS (SELECT 1 FROM @myTable)
BEGIN

    WITH MutRank AS
    (
        -- Find how many IDNumbers are associated with each mut
        SELECT Mut,
            COUNT(DISTINCT IDnumber) AS IDnumberCnt
        FROM @myTable
        GROUP BY Mut
    ), MutRarity AS
    (
        -- Translate the IDNumberCnt into a weighted rarity so dupes at  
        -- a IDNumberCnt level reduce their rarity compared to the next level 
        SELECT Mut,
            RANK() OVER (ORDER BY IDnumberCnt DESC) AS MutRarity
        FROM MutRank
    )
    -- Grab the IDNumber that is associated to the most "rare" muts
    SELECT @IDnumber = n.IDnumber
    FROM (SELECT TOP 1 m.IDnumber,
            SUM(MutRarity) AS IDNumberValue
        FROM @myTable m
        JOIN MutRarity mr
            ON m.Mut = mr.Mut
        GROUP BY m.IDnumber
        ORDER BY IDNumberValue DESC, IDnumber) n

    -- Save the number in our results
    INSERT @results (IDnumber)
    SELECT @IDnumber

    -- Purge all records for the "covered" muts and the selected IDNumber
    DELETE m2
    FROM @myTable m1
    JOIN @myTable m2
        ON m1.Mut = m2.Mut
        AND m1.IDnumber = @IDnumber
END

SELECT *
FROM @results
ORDER BY IDnumber
Eric J. Price
  • 2,740
  • 1
  • 15
  • 21
  • nice solution but the other solution gives *all* the sets.Thanks! – user918967 Oct 28 '16 at 20:49
  • Yeah, I thought you were looking for the single best solution with the least combination of the lowest numbers (so in your original set both 1,4,5,6 and 2,4,5,6 worked, but I wanted the first to return); I thought you were trying to avoid having to return them all and then get the smallest set. That said, I agree, @EdHarper solution is a very elegant solution. When I saw it I was impressed. – Eric J. Price Oct 28 '16 at 21:01
2

Here's an approach which calculates a bitmask of Mut values for each IDNumber, then uses a recursive CTE to generate all the possible combinations which complete the set. The code outputs all the possible IdNumber combinations which give the final result, excluding those where one or more IdNumber is redundant in a combination (such as 1,2,3,4,5,6 in the sample data set).

To convert this to work with your real data, the major difference is going to be that you'll almost certainly need to find another mechanism to assign bit values to Mut values. (I can cheat a bit here and calculate the bit values by manipulating the decimal ASCII code for each Mut letter - POWER(2,ASCII(Mut) - 64)).

DECLARE @myTable TABLE (
    IDnumber int,
    Mut varchar(1))

INSERT @myTable VALUES 
    (1,'A'),(1,'B'),(1,'C'),(1,'D'),
    (2,'A'),(2,'C'),(2,'D'),
    (3,'A'),(3,'B'),(3,'F'),(3,'Z'),
    (4,'A'),(4,'B'),(4,'E'),(4,'F'),
    (5,'Y'),
    (6,'X'),(6,'Z')

-- calculate the target bitmask
DECLARE @target bigint = (  SELECT SUM(POWER(2,ASCII(Mut) - 64))
                            FROM    (SELECT DISTINCT Mut FROM @myTable) AS x
                         )

;WITH baseCTE
AS
(
    --calculate a starting bitmask for each ID
    SELECT IDnumber, sum(mutbit) AS bitmask 
    FROM    (SELECT DISTINCT IDnumber, CAST(POWER(2,ASCII(Mut) - 64) AS bigint) AS mutbit FROM @myTable) AS x
    GROUP BY IDnumber
)
,recCTE
AS
(
    SELECT IDnumber, bitmask, CAST(IdNumber AS varchar(max)) AS trail, 1 AS lev
    FROM baseCTE
    UNION ALL
    SELECT b.IDnumber, b.bitmask | r.bitmask, CAST(CONCAT(r.trail,',',b.IDnumber) AS varchar(max)), r.lev + 1
    FROM recCTE AS r
    JOIN baseCTE AS b
    ON b.IDnumber > r.IDnumber
    WHERE b.bitmask | r.bitmask <> r.bitmask --ignore combinations which don't add any new Mut values
)
SELECT trail, lev 
FROM recCTE
WHERE bitmask = @target
ORDER BY lev

NB. SQL Server bitwise operators only work where one or other value is of an integer type, so this solution is restricted to 64 distinct Mut values (where the mask is a bigint) - to get it to work beyond that, you'd have to use a custom bitwise comparator (possibly using the CLR). However, since the question mentions that you have 43 Mut values, it might work for you for now.

Ed Harper
  • 21,127
  • 4
  • 54
  • 80
  • @user918967 - I've made a small update to my answer to deal with duplicates in the larger example set you posted. – Ed Harper Oct 28 '16 at 15:11
  • I really like your solution because it gives me all the sets! Could you also send me a pointer as to how to extend the custom bitwise comparator to more than 64 values? – user918967 Oct 28 '16 at 15:14
  • @user918967 - there's an old (2006) series of posts by Adam Machanic on implementing bitwise logic for large bitmasks - http://sqlblog.com/blogs/adam_machanic/archive/tags/bitmasks/default.aspx. For this problem you'll only need to implement bitwise OR (from part 3) – Ed Harper Oct 28 '16 at 15:23
  • @user918967 - it would potentially also be possible to write a CLR function to do the bitwise comparison. This is a big topic but you might be able to find an example somewhere. – Ed Harper Oct 28 '16 at 15:26
  • @user918967 - it might also be possible to split the bitmask across multiple bigints, and process the problem 64 `Mut` values at a time. – Ed Harper Oct 28 '16 at 15:29
  • @EdHarper: in the answer above you gave the solution for Mut as varchar(1). What would be the solution for Mut as varchar(50), for example – Catalin Mar 09 '19 at 06:33
  • 1
    @Catalin - the simplest solution would probably be to assign each of the `varchar(50)` values a unique `varchar(1)` value (for example by adding a column to the source data table), then you could use the same logic shown in this answer. – Ed Harper Mar 09 '19 at 10:25
  • @Catalin - ... but the important part is the bitmask - you could skip the `varchar(1)` step and assign a power of 2 value directly to each of the `varchar(50)` values. – Ed Harper Mar 09 '19 at 12:14