6

I've been working through some algorithm programming problems and have a question about one. The problem is the same one as the one referenced in this question: USACO: Subsets (Inefficient)

I was able to code some (non-dynamic) solutions that were too slow for high N. Had to cheat a bit and look up some solutions online. It turns out the fast algorithm is trivially simple, but even knowing the answer I still can't see how to get from the problem to the answer. I can see patterns in the way subsets of equal sums form, but I don't see the link between those patterns and the algorithmic solution.

The problem (link above) is:

Given a set of consecutive integers from 1 through N (1 <= N <= 39), how many ways can the set be partitioned into two subsets whose sums are identical? E.g., {1,2,3} can be partitioned one way: {1,2} {3}.

For larger sets the answer is either 0 (when N*(N+1)/2 is odd) or is given by this simple algorithm:

  arr = array of int with (N*(N+1)/4)+1 elements
  arr[0]=1  // all other elements initialized to 0
  for i = 1 to N
    for j = N*(N+1) / 4 downto i
      add arr[j-i] to arr[j]
  subsetpaircount = arr[N*(N+1)/4] / 2

Again, I can see how the algorithm works, I've even inserted print statements so I can "watch" how it works. I just can't see how the operation of the algorithm links up to the pattern of different ways the two-set partitions are generated.

A response in the linked question may be relevant, but I also can't connect up how it works: "This is the same thing as finding the coefficient x^0 term in the polynomial (x^1+1/x)(x^2+1/x^2)...(x^n+1/x^n). . . ."

Can anyone clarify the connection for me, or point me to some reference materials that explain this specific problem? Thanks.

Community
  • 1
  • 1
Herbert Sitz
  • 21,858
  • 9
  • 50
  • 54
  • What does 'increment arr[j] by arr[j-1]' mean? I think of increment as being something that adds 1 to a number. Further, is there an assumption in the algorithm that arr[i] begins with value 0? If not, what is that starting value for the array cells? – Lee Meador Jan 24 '13 at 22:01
  • @Lee, sorry my pseudocode conventions may not be perfect. I program often in Pascal, where the function inc(arg1,arg2) will increment arg1 by amount arg2. arr[i] does begin with element zero, and I left out key step in initializing its value, which I just added. . . – Herbert Sitz Jan 24 '13 at 22:27
  • 1
    @Lee: The use of increment is correct. It [means](https://www.google.com/search?q=define+increment) "An increase or addition, esp. one of a series on a fixed scale: "pay can escalate in five-cent increments"." – Ken White Jan 24 '13 at 22:44

2 Answers2

7

If the set S = {1,...,N} is partitioned into two subsets with equal sum, that sum must be half of the sum of S; the sum of S is N(N+1)/2 so the sum of each subset in the partition must be N(N+1)/4. It must also be an integer, hence N(N+1)/2 must be even.

So the question reduces to finding the number of subsets of S whose sum is N(N+1)/4. The total number of partitions will be exactly half this number, since each partition contains two such subsets, and no two partitions share a subset.

That much should be obvious.

Now, let's list the subsets of S which sum to k, for any k and any set S. Any such subset must have a greatest value, which must be an element of S. The greatest value must either be the greatest element of S, or it must be less than the greatest element of S. These two groups of subsets are distinct, so we can enumerate them separately. Let's call the greatest element of S Smax.

The second group is simple: it's simply the subsets of S - { Smax } which sum to k. We can find those by recursively calling the subset lister. But the first group is almost as simple: each set in the group includes Smax and the rest of its elements are some set in S - { Smax } which adds up to k - Smax, which again we can list recursively. To complete the recursion, we note that if S is the empty set, then if k = 0, there is precisely one qualify subset (the empty set itself), and if k is not 0, then there are no qualifying subsets. Each recursion removes one element from S, so the termination condition must eventually be reached.

It should be clear that the subsets of S which will be used by the above recursive function are just the numbers from 1 to Smax, so we can get rid of S altogether, and write the recursion as follows:

Subsets(min, max, k) =
  Subsets(min, max - 1, k)
  &Union; { {max, P} | P ∈ Subsets(min, max - 1, k - max) }

But we only want the count of the number of partitions, so we can simplify that a bit:

Count_Subsets(min, max, k) =
  Count_Subsets(min, max - 1, k)
  &plus; Count_Subsets(min, max - 1, k - max)

We need to complete the recursion by adding the end condition:

If min > max, Count_Subsets(min, max, k) = 1 if k = 0; otherwise 0

(In fact, it's simple to show that the recursion implies that the value will be 1 when k decrements to 0, and 0 if k is less than 0, so we can make the termination condition happen a lot earlier.)

That gives us the simple recursion for the count. But since it calls itself twice, it would be better to work backwards ("dynamic programming"). We need to compute Count_Subsets(1, N, N*(N+1)/4), which will require us to compute values of Count_Subsets(1, max, k) for all values of max from 1 to N, and from all values of k from 0 to N*(N+1)/4. We do that by starting with max = 0, and working up until we reach min = N. And that's precisely what your algorithm does; i is max, and the array is the set of values for k from 0 to N(N+1)/4.

By the way, as should be apparent from the above description, a[j] should be incremented by a[j - i], not by a[j - 1]

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    Great explanation, thanks! I had gotten the obvious part, but I wasn't going to grok the rest of it without help, even having seen the final algorithm and knowing that answer was being built up from smaller pieces. Or maybe I just wasn't dedicated or persistent enough. . . – Herbert Sitz Jan 25 '13 at 00:01
  • Also, any suggestions on some introductory text that might cover this type of problem in detail? I have Cormen, where there's section on partitioning and the subset sum problem, but it was way too cursory a treatment for me, esp. when I don't fully understand relation b/w this problem and the classic "subset sum" problem. I don't need to be spoonfed, but something with a little more detail and less for me to piece together myself would be helpful. – Herbert Sitz Jan 25 '13 at 00:13
  • @HerbertSitz, it's just the application of the simplest form of dynamic programming to simplify a recursion. Cormen has a whole chapter on DP, but for whatever reasons underemphasizes the fact that DP is basically just recursion turned on its head, something which Sedgwick at least mentions in his much briefer outline (I'm a big fan of Sedgwick, fwiw.) DP is usually used for optimization problems, but it can be used to efficiently compute many recursive problems. It's worth adding to your mental toolkit. – rici Jan 25 '13 at 00:25
  • @HerbertSitz: Another way of solving branching recursion (which would otherwise have exponential computation time) is memoization, or caching results. That can be more precise than DP, because it only computes the values you need. On the other hand, the overhead of memoization might be a lot higher than the overhead of computing some extra values. Exercise: for the partition problem, write both a memoization and a DP solution, and compare the number of subproblems solved. Can you characterize the complexity of both approaches? Which is better in theory? In practice? – rici Jan 25 '13 at 00:32
  • @HerbertSitz, and finally: the reason DP is mostly used for optimization problems is that recursion is not useful if you don't know where to start. DP only requires that you know where to finish, and you can explore the paths backwards to the finishline in the right order. Contrast this with depth-first search. – rici Jan 25 '13 at 00:35
  • Thanks again. A 1990 edition of Sedgwick is main book I like and I'm working through, but I didn't find much help there. In any case, I think the tips from you are giving me a foothold I can work from. I will try to find a memoization solution; not sure where to start there since it seems the DP algorithm already uses a form of memoization itself. – Herbert Sitz Jan 25 '13 at 01:01
  • @HerbertSitz: Precisely. The difference is that DP constructs the cache by starting from the end (Count(1, 0, 0)) and computing every value which (might possibly) be used. The caching recursive solution starts from the beginning (Count(min, max, K)) and then only computing needed values that have not yet been memoized. Try just fibonacci(n) (recursive: start with n; DP: start with (0, 1)) if that didn't make enough sense. – rici Jan 25 '13 at 01:14
1

I think there might be a mistake in your pseudocode which is causing the confusion. I would expect the line

add arr[j-1] to arr[j]

to be

add arr[j-i] to arr[j]

Assuming this is the case, then the way to think about this problem is that at the start of each iteration of the loop over i, the array arr[j] contains the number of ways of choosing a subset of the integers 1,2,...,i-1 such that the sum of the chosen integers is exactly j.

When you start, i=1 so the only choice of subset is the empty subset with sum equal to 1.

This is why arr[0]=1 (representing using an empty subset to get a total of 0) while all other entries are 0 (as there is no way of getting a non-zero sum).

From then on, each pass of the iteration considers adding the number i to the subset. The number of ways of getting a sum j will depend on whether i is included or not.

If i is not included, then we have the same number of ways as before (i.e. arr[j]).

If i is included, then in order to have a sum of j including i, we must add i to all the subsets of 1,..,i-1 which had a total of j-i. By design, our array contains exactly this number if we look at index j-i.

So the total number of ways of getting a sum of j becomes arr[j]+arr[j-i].

When the i loop completes, arr gives you the number of ways of choosing a subset and getting any sum you wish. We know that the sum of 1,2,..,n is n*(n-1)/2, so if we count how many subsets reach half this value then we are counting the partitions into equal sums.

Actually, this overcounts by a factor of 2 because it counts {1,2}/{3] and {3}/{1,2} as separate solutions, so the final answer is divided by 2.

Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75
  • Thanks! That is especially helpful bit about the incrementing of i and how, by design, the array will already have count of subsets to add for case where i is part of current set. – Herbert Sitz Jan 25 '13 at 00:09