9

I'm re-reading Skiena's Algorithm Design Manual to catch up on some stuff I've forgotten since school, and I'm a little baffled by his descriptions of Dynamic Programming. I've looked it up on Wikipedia and various other sites, and while the descriptions all make sense, I'm having trouble figuring out specific problems myself. Currently, I'm working on problem 3-5 from the Skiena book. (Given an array of n real numbers, find the maximum sum in any contiguous subvector of the input.) I have an O(n^2) solution, such as described in this answer. But I'm stuck on the O(N) solution using dynamic programming. It's not clear to me what the recurrence relation should be.

I see that the subsequences form a set of sums, like so:

S = {a,b,c,d}

a    a+b    a+b+c    a+b+c+d
     b      b+c      b+c+d
            c        c+d
                     d

What I don't get is how to pick which one is the greatest in linear time. I've tried doing things like keeping track of the greatest sum so far, and if the current value is positive, add it to the sum. But when you have larger sequences, this becomes problematic because there may be stretches of negative numbers that would decrease the sum, but a later large positive number may bring it back to being the maximum.

I'm also reminded of summed area tables. You can calculate all the sums using only the cumulative sums: a, a+b, a+b+c, a+b+c+d, etc. (For example, if you need b+c, it's just (a+b+c) - (a).) But don't see an O(N) way to get it.

Can anyone explain to me what the O(N) dynamic programming solution is for this particular problem? I feel like I almost get it, but that I'm missing something.

Community
  • 1
  • 1
user1118321
  • 25,567
  • 4
  • 55
  • 86

3 Answers3

11

You should take a look to this pdf back in the school in http://castle.eiu.edu here it is:

enter image description here

The explanation of the following pseudocode is also int the pdf. enter image description here

edgarmtze
  • 24,683
  • 80
  • 235
  • 386
  • 2
    I'm confused by the chart, since it skips several of the sub-sequences (such as [5, 15], and [15, -30]). But I will read through the PDF and see if it makes more sense. Thank you! – user1118321 Dec 27 '11 at 22:26
  • OK, after reading over it, it makes much more sense now. Thank you so much! – user1118321 Dec 28 '11 at 00:25
  • 6
    @cMinor link is broken. – Cacho Santa Oct 30 '13 at 21:52
  • After a lot of searching, this looks like one of the few places that has the right definition- the link is broken though – whizvids Mar 07 '16 at 18:44
  • Given pseudocode won't work if all the numbers are negative. – jblixr Jan 26 '17 at 12:08
  • @jblixr It will work with the assumption that the sum of an empty sequence is 0, which is reasonable because 0 is the identity for adding. Then, if all numbers are negative, the empty sequence is the greatest subsequence. – matj1 Apr 07 '23 at 14:17
2

My understanding of DP is about "making a table". In fact, the original meaning "programming" in DP is simply about making tables.

The key is to figure out what to put in the table, or modern terms: what state to track, or what's the vertex key/value in DAG (ignore these terms if they sound strange to you).

How about choose dp[i] table being the largest sum ending at index i of the array, for example, the array being [5, 15, -30, 10]

The second important key is "optimal substructure", that is to "assume" dp[i-1] already stores the largest sum for sub-sequences ending at index i-1, that's why the only step at i is to decide whether to include a[i] into the sub-sequence or not

dp[i] = max(dp[i-1], dp[i-1] + a[i])

The first term in max is to "not include a[i]", the second term is to "include a[i]". Notice, if we don't include a[i], the largest sum so far remains dp[i-1], which comes from the "optimal substructure" argument.

So the whole program looks like this (in Python):

a = [5,15,-30,10]

dp = [0]*len(a)
dp[0] = max(0,a[0]) # include a[0] or not

for i in range(1,len(a)):
    dp[i] = max(dp[i-1], dp[i-1]+a[i]) # for sub-sequence, choose to add or not     


 print(dp, max(dp)) 

The result: largest sum of sub-sequence should be the largest item in dp table, after i iterate through the array a. But take a close look at dp, it holds all the information.

Since it only goes through items in array a once, it's a O(n) algorithm.

This problem seems silly, because as long as a[i] is positive, we should always include it in the sub-sequence, because it will only increase the sum. This intuition matches the code

dp[i] = max(dp[i-1], dp[i-1] + a[i])

So the max. sum of sub-sequence problem is easy, and doesn't need DP at all. Simply,

sum = 0
for v in a:
     if  v >0
         sum += v

However, what about largest sum of "continuous sub-array" problem. All we need to change is just a single line of code

dp[i] = max(dp[i-1]+a[i], a[i])    

The first term is to "include a[i] in the continuous sub-array", the second term is to decide to start a new sub-array, starting a[i].

In this case, dp[i] is the max. sum continuous sub-array ending with index-i.

This is certainly better than a naive approach O(n^2)*O(n), to for j in range(0,i): inside the i-loop and sum all the possible sub-arrays.

One small caveat, because the way dp[0] is set, if all items in a are negative, we won't select any. So for the max sum continuous sub-array, we change that to

dp[0] = a[0]
Zhe Hu
  • 3,777
  • 4
  • 32
  • 45
  • BTW, it is `max(dp)` rather than `dp[-1]` is because the sub-sequence or sub-array may not include the last element of the array. – Zhe Hu Aug 22 '16 at 20:05
1

There is a solution like, first sort the array in to some auxiliary memory, then apply Longest Common Sub-Sequence method to the original array and the sorted array, with sum(not the length) of common sub-sequence in the 2 arrays as the entry into the table (Memoization). This can also solve the problem

Total running time is O(nlogn)+O(n^2) => O(n^2) Space is O(n) + O(n^2) => O(n^2)

This is not a good solution when memory comes into picture. This is just to give a glimpse on how problems can be reduced to one another.

FancyPants
  • 101
  • 9