7

Let's say I have problem like this:

  • Knapsack capacity = 20 million
  • Number of item = 500
  • Weight of each item is random number between 100 to 20 million
  • Profit of each item is random number between 1 to 10

So which is the best method for my problem? GA or Dynamic Programming?

Please give me a simple explanation, as I'm newbie in this..

halfer
  • 19,824
  • 17
  • 99
  • 186
hanx
  • 83
  • 1
  • 3

4 Answers4

5

Dynamic Programming (DP):

  • Exact algorithm - Finds global optimal solution
  • Long running time
  • Uses a lot of memory
  • Very simple to implement

Genetic Algorithm (GA):

  • Estimation - Doesn't necessarily find the global optimal solution
  • Short running time
  • Memory usage depends on number of individuals but is generally managable
  • Quality of solution depends on choosing an efficient representation + letting it run long enough
  • Reasonably simple to implement, design decisions may be a little more complex, especially if you don't have significant experience with GAs

Hill climbing:

  • Estimation - Doesn't necessarily find the global optimal solution. More subject to halting at a local optimum than a GA, though there are ways to reduce the chance of this happening
  • Short running time
  • Very low memory usage
  • Very simple to implement

DP (or any exact algorithm for an NP-complete problem) is generally only a good idea for a reasonably small problem, or if finding the global optimal is the most important thing.

There are 2 approaches to DP: (there is a reasonably simple optimization where you only store 2 rows, my memory usage analysis goes on the assumption that this optimization is used)

  • Have a matrix of items x weight, with cell value being the maximum value

    Matrix size = 500 x 20 000 000

    Running time = O(500 * 20 000 000) = O(10 000 000 000)

    Memory = max 10 * 500 -> 5 000 -> short = 2 bytes -> 2 * 20 000 000 * 2 = 80 000 000 < 80 MB

    Explanation: A[i,j] below represents the best (highest) value obtainable from any subset of the elements 1 to i with weight less than or equal to j. The update rule below means - find the best value between either not including the current element (thus the weight and value stays the same) or including it (so lookup the optimal value for (the current weight minus the current item's weight) and add the value of the current item to it). Then just return the A[500, 20000000], which represents the highest value obtainable from any subset of all elements with a maximum weight of the knapsack size.

    Algorithm:

    A[0, 0..20000000] = 0
    for i = 1:500
    for x = 0:20000000
      A[i, x] = max(A[i-1, x], value(i) + A[i-1, x-weight(i)])
    // ignore value(i) + A[i-1, x-weight(i)] when weight(i) > x
    return A[500, 20000000]
    
  • Have a matrix of items x value, with cell value being the minimum weight

    Matrix size = 500 x 10*500

    Running time = O(500 * 10*500) = O(2 500 000)

    Memory = max 20 000 000 -> int = 4 bytes -> 2 * 500 * 4 = 4 000 < 4 KB

    Explanation: A[i,j] below represents the lowest weight obtainable from any subset of the elements 1 to i with value equal to j. The update rule below means - find the best value between either not including the current element (thus the weight and value stays the same) or including it (so lookup the optimal value for (the current value minus the current item's value) and add the weight of the current item to it). The value at any cell is the exact weight of a subset to produce that value, so we need to look through all the cells A[500, x], which represents minimum weight subsets of elements for any value x.

    Algorithm:

    A[0, 0] = 0
    A[0, 1..5000] = ∞
    for i = 1:500
    for x = 0:5000
      A[i, x] = min(A[i-1, x], weight(i) + A[i-1, x-value(i)])
    // interpret A[i-1, x-value(i)] as 0 when value(i) > x
    return largest x that A[500, x] <= 20000000
    

So, yeah, the complexities pretty much speak for themselves, you'll wait a few hours for the first way, but mere seconds for the second, and there's a similar difference in memory usage (though 80 MB is still near negligible) (note that this is FAR from a rule, every case needs to be analysed in its own right).

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • The size of the problem is not so bad. I think in this question, all the values are integers, so there is pseudo linear algorithm, which can be completed in the order of multiple hours. – iampat Feb 13 '13 at 08:40
  • so GA is not always give the best result right? If I'm using DP, I have to scale down the number of weight and capacity right? Can I combine DP with scaling (and rounding)? will this affect the result? – hanx Feb 13 '13 at 08:50
  • @user2067479 No, a GA doesn't always give the correct result. Scaling and rounding (especially in this case) may significantly reduce the quality of the result. I'll edit the answer in a few hours after I've checked something, if I remember. – Bernhard Barker Feb 13 '13 at 09:07
  • great..waiting for that.. – hanx Feb 13 '13 at 09:09
  • DP will definitely be a lot slower, but it won't run out of memory. You can fit it in 40 MB in this case. – IVlad Feb 13 '13 at 14:17
  • what about GA computation? how complex it is? – hanx Feb 13 '13 at 16:33
  • 1
    @user2067479 As promised, see edit, hopefully that also clears up your GA question. – Bernhard Barker Feb 13 '13 at 17:08
  • @IVlad Edited. I did briefly forget about the optimization where you only store 2 rows. – Bernhard Barker Feb 13 '13 at 17:09
  • See my answer. You do not even need 2 rows, one is enough :). +1 otherwise! – IVlad Feb 13 '13 at 17:12
  • Wait, can you describe your second DP approach better? I don't understand how that works in this case. – IVlad Feb 13 '13 at 17:13
  • @IVlad Added pseudo-code. I don't have a link for it, got it from an [online algorithms course](https://class.coursera.org/algo2-2012-001). – Bernhard Barker Feb 13 '13 at 17:50
  • Ah, you're minimizing the weight for each gain, didn't think of that. It should work I think, and in this case DP is definitely the way to go. – IVlad Feb 13 '13 at 18:15
  • @Dukeling more clear now. But in your second DP approach I don't really get it. Why the matrix size so much different? BTW, I appreciate all your answers guys.. – hanx Feb 13 '13 at 18:22
  • @user2067479 Added explanations. The matrix sizes are so different because the one is index by weight (goes up to 20000000), the other by value (goes up to 5000). – Bernhard Barker Feb 13 '13 at 20:52
3

Dynamic programming can run in time O(numItems * knapsackCapacity) and O(knapsackCapacity) memory. This means that, for your specifications, you would have:

  • about 20.000.000 x 500 = 10.000.000.000 operations - would probably finish executing in a few hours, depending on your machine;
  • since the profit of one item is at most 10 and you can have at most 500 items, that means your maximum theoretical profit cannot exceed 10 x 500 = 5000. This can be represented on 2 bytes (a short). So you will also need 2 x 20.000.000 / 1024 / 1024 ~= 38 MB of memory to store your DP array. Again, not that much.

The others have already mentioned the advantages and disadvantages of each. GA will finish faster, but the DP solution should also finish in a few hours, and it will give you an exact answer.

Personally, I would choose DP if waiting a few hours for the solution is not a problem.

Note: here is the DP that runs with O(knapsackCapacity) memory:

dp[i] = maximum profit obtainable for weight i
for i = 1 to numItems do
  for j = knapsackCapacity down to items[i].weight do
    dp[j] = max(dp[j], dp[j - items[i].weight] + items[i].profit)
IVlad
  • 43,099
  • 13
  • 111
  • 179
0

There is no "best" method, each has it's pros and cons.
In this case the trade off is between optimality of the solution found - GA does not in any way guarantee to find the best solution.
And the time/space required to compute the solution - using dynamic programming will save you some redundant computations but you will still have to compute all possible combinations at least once to find the best solution (or maybe even any solution at all).

yurib
  • 8,043
  • 3
  • 30
  • 55
0

First you need to consider Dynamic programming as an exact algorithm which can guarantee that the answer is going to be an optimum answer. On the other hand GA is a heuristic algorithm which usually converge to a local optima. There is a dynamic programming solution for the knapsack which is pseudo linear (O(capacity*number_of_items)) if they are integers. In your case, you need about 1e10 operation and memory. Which are feasible for a single computer. So if finding the optimum answer is important for you and you have enough time (in the order of a couple of hours) you can use a dynamic programming approach. Otherwise, you may prefer to use a heuristic algorithm, such as GA.

iampat
  • 1,072
  • 1
  • 12
  • 23