2

I've been working in SQL Server 2008 R2 to develop a stored procedure that can match split quantities. I'm having trouble with the algorithm; the one I developed, which I'll describe, can't scale past 55 items. If an alternate solution is available, the name of the algorithm will do. I can implement it.

Problem:

I have a list of quantities, and the sum of some subset of those numbers needs to equal a target number. Quantities are not unique. For example, assume you have quantities:

1, 2, 3, 8

and the target quantity is 11. Two solutions exist: 1+2+8 and 3+8. It doesn't matter which solution is found, I only need a solution.

This is how I've tried to solve it (keep in mind, this is in SQL). Assign a bitmask to each quantity:

1:  0001
2:  0010
3:  0100
8:  1000

Using a counter from 15 to 1 and using bitwise-and, we get:

15:  1111: 1, 2, 3, 8     total: 14
14:  1110: 2, 3, 8        total: 13
13:  1101: 1, 2, 8        total: 11 (solution found)
...
2:   0010: 2
1:   0001: 1

This works great... when the number of items is less than 56. At that point, 2^57 gets truncated. I'm using decimal(38,0) as my data type so I have 17 bytes (136 bits). I'm pretty much guaranteed never to have more than 125 items, so this is fine. But at 57, my bitmask strategy fails.

Does this make sense? Please comment if I need to clarify the algorithm.

Question 1: Can this algorithm be made to scale out to 125? If not, or if another solution would be more efficient:

Question 2: What is the name of another algorithm I can implement to solve the problem? Surely this sort of problem is common enough that it has a named algorithm.... right?

My implementation, for those interested

Uncomment the creation and filling of foo to create the test data.

--create table foo
--(
--  ID          int         primary key identity(1,1),
--  Qty         int         not null
--)
--go

--create nonclustered index IX_FOO__Qty on foo (Qty) include (ID)
--go

--insert into foo (Qty) values (1)
--insert into foo (Qty) values (1)
--insert into foo (Qty) values (2)
--insert into foo (Qty) values (3)
--insert into foo (Qty) values (7)

declare @seek int = 9,
        @total int = 0,
        @n int,
        @oldn int = 0,
        @i int

select @i = SQUARE(count(1))-1 from foo     -- start counting down from the top

select 
    x.ID, 
    x.Qty, 
    x.v as flags, 
    0 isanswer,
    convert(varbinary(16), v) hex
into #tmp
from
(
    select f.ID, f.Qty, POWER(2, row_number() over(order by f.qty) - 1) v
    from foo f
) x

while @i > 0
begin
    select @total = sum(Qty), @n = count(ID) from #tmp t where flags & @i != 0

    if (@total = @seek and @oldn < @n)
    begin
        update #tmp set isanswer = case when flags & @i != 0 then 1 else 0 end
        select @oldn = @n
    end

    select @i = @i - 1
end


select * from #tmp where isanswer = 1
drop table #tmp

This has the result set:

ID  Qty     flags   isanswer   hex
1   1       1       1          0x00000001
2   1       2       1          0x00000002
5   7       16      1          0x00000010

Any thoughts? I recognize I would have a much easier time doing this in C#, but if a SQL solution is possible...

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • I could use nested loops, but an arbitrary number of items might need to go into the subset. If I use 5 nested loops, but it takes 6 items to sum to the target, then no solution would be found. Is it possible to do recursion, or to split my bitmask somehow? –  Jul 11 '13 at 23:02
  • battery dying sorry :http://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum + http://msdn.microsoft.com/en-us/library/aa175801(v=sql.80).aspx ? –  Jul 11 '13 at 23:14
  • if it would be much easier in C#, would it be better to write a method in C# and plug it into sql as an extended stored procedure? – John Bingham Jul 12 '13 at 00:22
  • I wrote a solution for this a few years ago that achieved `O(Sqrt(2^N))` in T-SQL (that's as good as you can do unless you can apply the pseudo-polynomial solution). I'll see if I can find it tomorrow ... – RBarryYoung Jul 12 '13 at 04:51
  • You say "*I'm using decimal(38,0)..*" but I don't see any `DECIMAL` datatypes at all in your code listing. In fact, I don't see anything but `int`s, which are only 32 bits, so how did you ever get to 57 bits? Finally, exactly *how long* does it take your code to get to enumerate up to 2^57? Because my estimate is somewhere between 1 month and 1 year, which makes it seem like you may not have ever actually run it. – RBarryYoung Jul 12 '13 at 05:08
  • @RBarryYoung, oops, yeah. I had taken the posted code and adapted it to real-world data, which is where I encountered my scaling problems, which I attempted to overcome by switching to `decimal(38,0)`. Sorry for the confusion. And no, I haven't run the real-world code *because of the scaling problems I described.* –  Jul 12 '13 at 15:37

2 Answers2

1

This problem is closely related to the knapsack problem and the subset sum problem.

Knapsack problem: Given a bunch of items, each with a weight and a value, find a subset whose total weight is below a certain target and whose total value is maximal. Your problem can be viewed as a variant where the weight and the value of each item are equal.

Subset sum problem: Given a bunch of integers, find a non-empty subset whose sum is equal to 0 (or some other target value). Your problem can be viewed as a variant with the added restriction that all the integers are positive.

Both of these problems are NP-complete when viewed only in terms of the number of items. This means that in general the time it takes to solve the problem is exponential in the number of elements.

However, if the target value is restricted there is a dynamic programming solution which is O(N*T), where N is the number of elements and T is the target. So the problem can be solved when there are 125 elements provided the target value is not huge.

Here is the algorithm in pseudocode:

function solve(quantities, target):

    possible[0 .. target+1] = array of booleans initialized to False
    choices[0 .. target+1] = array of IDs

    possible[0] = True

    -- compute solutions to all subproblems
    for each id:
        v = quantities[id]
        -- try to add v to all existing solutions
        for i = target down to v:
            if possible[i - v] and not possible[i]:
                possible[i] = True
                choices[i] = id

    if not possible[target]:
        fail

    -- backtrack to get the full solution
    soln = []
    while target != 0:
        id = choices[target]
        soln.append(id)
        target = target - quantities[id]
    return soln
tom
  • 21,844
  • 6
  • 43
  • 36
0

Unfortunately, the subset sum problem is NP-complete, so you are not likely to find exact algorithms that perform well for larger inputs. There exists an approximation algorithm (explained in the article) whose runtime depends on how close you need to get. If you need an exact solution, the dynamic programming that is described in the article will likely outperform your current solution.

Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81