Memory-constrained coin changing for numbers up to one billion

Question

I faced this problem on one training. Namely we have given N different values (N<= 100). Let's name this array A[N], for this array A we are sure that we have 1 in the array and A[i] ≤ 10⁹. Secondly we have given number S where S ≤ 10⁹.

Now we have to solve classic coin problem with this values. Actually we need to find minimum number of element which will sum to exactly S. Every element from A can be used infinite number of times.

Time limit: 1 sec
Memory limit: 256 MB

Example:

S = 1000, N = 10

A[] = {1,12,123,4,5,678,7,8,9,10}. The result is 10.

1000 = 678 + 123 + 123 + 12 + 12 + 12 + 12 + 12 + 12 + 4

What I have tried

I tried to solve this with classic dynamic programming coin problem technique but it uses too much memory and it gives memory limit exceeded.

I can't figure out what should we keep about those values. Thanks in advance.

Here are the couple test cases that cannot be solved with the classic dp coin problem.

S = 1000000000 N = 100

1 373241370 973754081 826685384 491500595 765099032 823328348 462385937 
251930295 819055757 641895809 106173894 898709067 513260292 548326059 
741996520 959257789 328409680 411542100 329874568 352458265 609729300 
389721366 313699758 383922849 104342783 224127933 99215674 37629322 
230018005 33875545 767937253 763298440 781853694 420819727 794366283 
178777428 881069368 595934934 321543015 27436140 280556657 851680043 
318369090 364177373 431592761 487380596 428235724 134037293 372264778 
267891476 218390453 550035096 220099490 71718497 860530411 175542466 
548997466 884701071 774620807 118472853 432325205 795739616 266609698 
242622150 433332316 150791955 691702017 803277687 323953978 521256141 
174108096 412366100 813501388 642963957 415051728 740653706 68239387 
982329783 619220557 861659596 303476058 85512863 72420422 645130771 
228736228 367259743 400311288 105258339 628254036 495010223 40223395 
110232856 856929227 25543992 957121494 359385967 533951841 449476607 
134830774
OUTPUT FOR THIS TEST CASE: 5

S = 999865497 N = 7

1 267062069 637323855 219276511 404376890 528753603 199747292
OUTPUT FOR THIS TEST CASE: 1129042

S = 1000000000 N = 40

1 12 123 4 5 678 7 8 9 10 400 25 23 1000 67 98 33 46 79 896 11 112 1223 412 
532 6781 17 18 19 170 1400 925 723 11000 607 983 313 486 739 896
OUTPUT FOR THIS TEST CASE: 90910

What do you mean by the "classic dp coin problem technique?" Did you try this one https://en.wikipedia.org/wiki/Change-making_problem#Optimal_Substructure? — גלעד ברקן, Jul 12 '17 at 16:30
Could you please post your code here? The used memory should not be bigger than factor of S and this should not be a big issue, — Saeed Amiri, Jul 12 '17 at 16:36
I have a work around for the flaw in my previous answer. As you stated it was Greedy. So a tweak can be made such that it doesn't stop after a list has been found. It shall search for a new list, if this new list is greater or equal in size than the previous, discard, else discard the previous list and store this new one. But in this I am sure I shall exceed time limit but space is managed well — Pushan Gupta, Jul 12 '17 at 19:30
I see what you mean but how are we going to keep the lists, they request a lot of memory — someone12321, Jul 12 '17 at 21:46
Not much memory would be required. You just have to keep a single list. And at most there are 100 elements in the worst case. So that would mean 400 bytes for 4bytes per int. And plus an auxiliary list. So a total of 800 bytes — Pushan Gupta, Jul 13 '17 at 08:25
Can u post pseudo code for your solution, because I dont really see how it works — someone12321, Jul 13 '17 at 08:28
Try this: https://gist.github.com/anonymous/c53125efcacc1a76c1074f229f80b383 And next time please use @ to poke!. If that works I shall post a psedo code with explanation — Pushan Gupta, Jul 13 '17 at 09:47
@Vidor Vistrom I've read your code but I'm having a lot of trouble seeing what it does (there are no comments) and I'm not sure I see why you switch between `int` and `Integer`. Perhaps you should write up a detailed explanation? — templatetypedef, Jul 13 '17 at 15:03
Yeah I just wanted to make sure it is acceptable? If yes, I shall update my previous post with complete explanation. Just let me know in either case — Pushan Gupta, Jul 13 '17 at 15:30
Do you have any test data sets? I really question whether the general problem is solvable within these time and space bounds, and I would like to see what the toughest actual cases look like. In particular if the A(N)'s are all less than 10^6 it should be tractable. — RBarryYoung, Jul 13 '17 at 16:04
@VidorVistrom I think it's probably best for you to provide your own test cases and explanation about how your solution works rather than expecting the community to validate it. — templatetypedef, Jul 13 '17 at 16:43
@templatetypedef I am not asking the community to validate it. In fact, I am asking the OP to check his online judge and validate. Okay I shall explain what it does. Give me a moment. — Pushan Gupta, Jul 13 '17 at 16:45
@templatetypedef I have updated the answer and the code(provided in the answer itself). You may see the explaination — Pushan Gupta, Jul 13 '17 at 17:53
@templatetypedef I have invited you for a discussion on chat please check inbox — Pushan Gupta, Jul 13 '17 at 18:22
I can give you all the test cases you want, I have the original ones — someone12321, Jul 14 '17 at 14:47
@VidorVistrom Yes, given that the "classic DP" algorithm is `O(N*S)`, it's pretty hard to believe it could solve that first test case in under 1 Second even *with* unlimited memory since that's at least 100 Billion operations. — RBarryYoung, Jul 14 '17 at 17:47
If we are talking about standard dp coin problem it works only for small cases — someone12321, Jul 14 '17 at 17:47
@someone12321 Then what's the *real* time limit? Because 1 second seems well beyond what's attainable even with unlimited memory. — RBarryYoung, Jul 14 '17 at 17:52
Limits are given in the question, 1 secund for time and 256 mb of memory — someone12321, Jul 14 '17 at 17:57
Tell me honestly, its one those google foobar questions? If you found this on industry training, then they have must provided solution too? — Pushan Gupta, Jul 14 '17 at 18:27
I dont know what is google foobar, but this question is from competition in informatics for high school students younger of 16 years. Participating countries are from balkan and only couple european countries — someone12321, Jul 14 '17 at 18:30
Btw they uploaded the test cases but they still havent published the correct solution — someone12321, Jul 14 '17 at 19:42
Well, I want to meet the high school kids that can solve this one. I think that I have an answer, but it is crazy hard. — RBarryYoung, Jul 15 '17 at 16:47
OK, I do have a solution that meets both memory and run-time requirements for all the test cases. I will post it as soon as I have cleaned it up, and removed the unnecessary test optimizations. — RBarryYoung, Jul 15 '17 at 17:03
I'm excited to see your solution! BTW, only one kid out from more than 80 managed to solve for all points. — someone12321, Jul 15 '17 at 17:44
@templatetypedef FYI, I have provided an analysis of my answer. — RBarryYoung, Jul 19 '17 at 20:17
@templatetypedef or user12321, this might be the time to award that bounty before it expires. Or mark a response as the answer. — RBarryYoung, Jul 21 '17 at 12:56

RBarryYoung · Answer 1 · 2017-07-19T20:15:37.920

(NOTE: Updated and edited for clarity. Complexity Analysis added at the end.)

OK, here is my solution, including my fixes to the performance issues found by @PeterdeRivaz. I have tested this against all of the test cases provided in the question and the comments and it finishes all in under a second (well, 1.5s in one case), using primarily only the memory for the partial results cache (I'd guess about 16MB).

Rather than using the traditional DP solution (which is both too slow and requires too much memory), I use a Depth-First, Greedy-First combinatorial search with pruning using current best results. I was surprised (very) that this works as well as it does, but I still suspect that you could construct test sets that would take a worst-case exponential amount of time.

First there is a master function that is the only thing that calling code needs to call. It handles all of the setup and initialization and calls everything else. (all code is C#)

// Find the min# of coins for a specified sum
int CountChange(int targetSum, int[] coins)
{
    // init the cache for (partial) memoization
    PrevResultCache = new PartialResult[1048576];

    // make sure the coins are sorted lowest to highest
    Array.Sort(coins);

    int curBest = targetSum;
    int result = CountChange_r(targetSum, coins, coins.GetLength(0)-1, 0, ref curBest);

    return result;
}

Because of the problem test-cases raised by @PeterdeRivaz I have also added a partial results cache to handle when there are large numbers in N[] that are close together.

Here is the code for the cache:

    // implement a very simple cache for previous results of remainder counts
    struct PartialResult
    {
        public int PartialSum;
        public int CoinVal;
        public int RemainingCount;
    }
    PartialResult[] PrevResultCache;

    // checks the partial count cache for already calculated results
    int PrevAddlCount(int currSum, int currCoinVal)
    {
        int cacheAddr = currSum & 1048575;  // AND with (2^20-1) to get only the first 20 bits
        PartialResult prev = PrevResultCache[cacheAddr];

        // use it, as long as it's actually the same partial sum 
        // and the coin value is at least as large as the current coin
        if ((prev.PartialSum == currSum) && (prev.CoinVal >= currCoinVal))
        {
            return prev.RemainingCount;
        }
        // otherwise flag as empty
        return 0;
    }

    // add or overwrite a new value to the cache
    void AddPartialCount(int currSum, int currCoinVal, int remainingCount)
    {
        int cacheAddr = currSum & 1048575;  // AND with (2^20-1) to get only the first 20 bits
        PartialResult prev = PrevResultCache[cacheAddr];

        // only add if the Sum is different or the result is better
        if ((prev.PartialSum != currSum)
            || (prev.CoinVal <= currCoinVal)
            || (prev.RemainingCount == 0)
            || (prev.RemainingCount >= remainingCount)
            )
        {
            prev.PartialSum = currSum;
            prev.CoinVal = currCoinVal;
            prev.RemainingCount = remainingCount;
            PrevResultCache[cacheAddr] = prev;
        }
    }

And here is the code for the recursive function that does the actual counting:

/*
* Find the minimum number of coins required totaling to a specifuc sum
* using a list of coin denominations passed.
*
* Memory Requirements: O(N)  where N is the number of coin denominations
*                            (primarily for the stack)
* 
* CPU requirements:  O(Sqrt(S)*N) where S is the target Sum
*                           (Average, estimated.  This is very hard to figure out.)
*/
int CountChange_r(int targetSum, int[] coins, int coinIdx, int curCount, ref int curBest)
{
    int coinVal = coins[coinIdx];
    int newCount = 0;

    // check to see if we are at the end of the search tree (curIdx=0, coinVal=1)
    // or we have reached the targetSum
    if ((coinVal == 1) || (targetSum == 0))
    {
        // just use math get the final total for this path/combination 
        newCount = curCount + targetSum;
        // update, if we have a new curBest
        if (newCount < curBest) curBest = newCount;
        return newCount;
    }

    // prune this whole branch, if it cannot possibly improve the curBest
    int bestPossible = curCount + (targetSum / coinVal);
    if (bestPossible >= curBest) 
            return bestPossible; //NOTE: this is a false answer, but it shouldnt matter
                                    //  because we should never use it.

    // check the cache to see if a remainder-count for this partial sum
    // already exists (and used coins at least as large as ours)
    int prevRemCount = PrevAddlCount(targetSum, coinVal);
    if (prevRemCount > 0)
    {
        // it exists, so use it
        newCount = prevRemCount + targetSum;
        // update, if we have a new curBest
        if (newCount < curBest) curBest = newCount;
        return newCount;
    }

    // always try the largest remaining coin first, starting with the 
    // maximum possible number of that coin (greedy-first searching)
    newCount = curCount + targetSum;
    for (int cnt = targetSum / coinVal; cnt >= 0; cnt--)
    {
        int tmpCount = CountChange_r(targetSum - (cnt * coinVal), coins, coinIdx - 1, curCount + cnt, ref curBest);

        if (tmpCount < newCount) newCount = tmpCount;
    }

    // Add our new partial result to the cache
    AddPartialCount(targetSum, coinVal, newCount - curCount);

    return newCount;
}

Analysis:

Memory:

Memory usage is pretty easy to determine for this algorithm. Basiclly there's only the partial results cache and the stack. The cache is fixed at appx. 1 million entries times the size of each entry (3*4 bytes), so about 12MB. The stack is limited to O(N), so together, memory is clearly not a problem.

CPU:

The run-time complexity of this algorithm starts out hard to determine and then gets harder, so please excuse me because there's a lot of hand-waving here. I tried to search for an analysis of just the brute-force problem (combinatorial search of sums of N*kn base values summing to S) but not much turned up. What little there was tended to say it was O(N^S), which is clearly too high. I think that a fairer estimate is O(N^(S/N)) or possibly O(N^(S/AVG(N)) or even O(N^(S/(Gmean(N))) where Gmean(N) is the geometric mean of the elements of N[]. This solution starts out with the brute-force combinatorial search and then improves it with two significant optimizations.

The first is the pruning of branches based on estimates of the best possible results for that branch versus what the best result it has already found. If the best-case estimators were perfectly accurate and the work for branches was perfectly distributed, this would mean that if we find a result that is better than 90% of the other possible cases, then pruning would effectively eliminate 90% of the work from that point on. To make a long story short here, this should work out that the amount of work still remaining after pruning should shrink harmonically as it progress. Assuming that some kind of summing/integration should be applied to get a work total, this appears to me to work out to a logarithm of the original work. So let's call it O(Log(N^(S/N)), or O(N*Log(S/N)) which is pretty darn good. (Though O(N*Log(S/Gmean(N))) is probably more accurate).

However, there are two obvious holes with this. First, it is true that the best-case estimators are not perfectly accurate and thus they will not prune as effectively as assumed above, but, this is somewhat counter-balanced by the Greedy-First ordering of the branches which gives the best chances for finding better solutions early in the search which increase the effectiveness of pruning.

The second problem is that the best-case estimator works better when the different values of N are far apart. Specifically, if |(S/n2 - S/n1)| > 1 for any 2 values in N, then it becomes almost perfectly effective. For values of N less than SQRT(S), then even two adjacent values (k, k+1) are far enough apart that that this rule applies. However for increasing values above SQRT(S) a window opens up so that any number of N-values within that window will not be able to effectively prune each other. The size of this window is approximately K/SQRT(S). So if S=10^9, when K is around 10^6 this window will be almost 30 numbers wide. This means that N[] could contain 1 plus every number from 1000001 to 1000029 and the pruning optimization would provide almost no benefit.

To address this, I added the partial results cache which allows memoization of the most recent partial sums up to the target S. This takes advantage of the fact that when the N-values are close together, they will tend to have an extremely high number of duplicates in their sums. As best as I can figure, this effectiveness is approximately the N times the J-th root of the problem size where J = S/K and K is some measure of the average size of the N-values (Gmean(N) is probably the best estimate). If we apply this to the brute-force combinatorial search, assuming that pruning is ineffective, we get O((N^(S/Gmean(N)))^(1/Gmean(N))), which I think is also O(N^(S/(Gmean(N)^2))).

So, at this point take your pick. I know this is really sketchy, and even if it is correct, it is still very sensitive to the distribution of the N-values, so lots of variance.

How do you get the CPU requirements? I find this solution is very slow if I have a high target (e.g. 1 billion) and several input values close together (e.g. 10million+1,10million+2,...,10million+10) — Peter de Rivaz, Jul 15 '17 at 20:03
@PeterdeRivaz Yep, I tried that one and its really slow, which I actually figured. For medium-big values close together the best-estimate/pruning trick doesn't work well, so it goes exponential. This was why I insisted on seeing the actual test cases, I am still not convinced that the general case is solvable in these bounds, but a large number of categories of test-cases can be solved quickly with this approach. — RBarryYoung, Jul 15 '17 at 20:50
@PeterdeRivaz NOTE, I have updated my answer with a solution to this problem. Please let me know of any additional issues. — RBarryYoung, Jul 16 '17 at 19:36
How about `394842710, [1, 19599271, 45306791, 18186221, 4297625, 14883645, 35852124, 7563775, 1168781, 10777798, 32662761, 38535143, 48208183, 15900004, 9561325, 43048939, 31774586, 19646919, 46765642, 1272670, 34114210, 12839796, 49118670, 16061227, 47112687, 36574013, 7055028, 22182018, 2940844, 21237332, 43977109, 49740418, 16093741, 17505128, 40015993, 11030779, 46201395, 3999146, 2728890, 44503665, 44896360, 7930227, 36737527, 13875589, 43225195, 19872983, 30884901, 23112776, 44523696, 18955480, 39904879, 9120011, 10315159, 44860419, 7052437, 40886301, 5541215, 44693355]` — maniek, Jul 16 '17 at 22:05
@maniek Just curious, how did you come up with this test-set? It was by far the hardest one I tested and I would like to be able to generate other hard test-sets. — RBarryYoung, Jul 02 '18 at 20:53
Choose parameters X, Y, Z. generate set A of X random numbers, each smaller than Y. Add the number 1 to the set. Get a subset of A, with size Z. set S as the sum of the subset. — maniek, Jul 04 '18 at 01:15

maxim1000 · Answer 2 · 2017-07-18T20:14:56.923

3

[I've replaced the previous idea about bit operations because it seems to be too time consuming]

A bit crazy idea and incomplete but may work.

Let's start with introducing f(n,s) which returns number of combinations in which s can be composed from n coins.

Now, how f(n+1,s) is related to f(n)?

One of possible ways to calculate it is:

f(n+1,s)=sum[coin:coins]f(n,s-coin)

For example, if we have coins 1 and 3,

f(0,)=[1,0,0,0,0,0,0,0] - with zero coins we can have only zero sum

f(1,)=[0,1,0,1,0,0,0,0] - what we can have with one coin

f(2,)=[0,0,1,0,2,0,1,0] - what we can have with two coins

We can rewrite it a bit differently:

f(n+1,s)=sum[i=0..max]f(n,s-i)*a(i)

a(i)=1 if we have coin i and 0 otherwise

What we have here is convolution: f(n+1,)=conv(f(n,),a)

https://en.wikipedia.org/wiki/Convolution

Computing it as definition suggests gives O(n^2)

But we can use Fourier transform to reduce it to O(n*log n).

https://en.wikipedia.org/wiki/Convolution#Convolution_theorem

So now we have more-or-less cheap way to find out what numbers are possible with n coins without going incrementally - just calculate n-th power of F(a) and apply inverse Fourier transform.

This allows us to make a kind of binary search which can help handling cases when the answer is big.

As I said the idea is incomplete - for now I have no idea how to combine bit representation with Fourier transforms (to satisfy memory constraint) and whether we will fit into 1 second on any "regular" CPU...

edited Jul 18 '17 at 20:14

answered Jul 16 '17 at 05:50

maxim1000

6,297
1
23
19

How do you go from a list of sums we can make to the minimum number of coins needed to make the sum? – templatetypedef Jul 16 '17 at 07:17
the number of the coins is just an index of our iteration - once we found that number set contains the needed number we output index of iteration – maxim1000 Jul 16 '17 at 07:48
I think that you might be on to something here, but there's way too much ambiguity and term-shifting over what you mean by `f(n,s)` and `a(i)`. If these are supposed to be recursively defined sequences or functions, then you need to at least describe their base or terminating cases. – RBarryYoung Jul 17 '17 at 20:27
@RBarryYoung, they are just functions (or arrays), but maybe it's better to call f(n,s) as fn(s), so that it's clear for which arrays we calculate Fourier transform – maxim1000 Jul 18 '17 at 05:28
The issue isn't arrays vs functions, etc. The issue is that you haven't defined their contents/returns well enough that we can figure out what they are supposed to represent/contain. – RBarryYoung Jul 18 '17 at 13:30
@RBarryYoung, a[i] is defined quite clearly I think, and f(n+1,s) can be calculated from f(n,s) (the first formula with sum) – maxim1000 Jul 18 '17 at 19:50
Neither is clear to me. I suggest taking a small example and demonstrating how they would be used. – RBarryYoung Jul 18 '17 at 20:00
@RBarryYoung, I guess I found more clear definition for f(n,s) than "zero or something non-zero" :) – maxim1000 Jul 18 '17 at 20:15

Memory-constrained coin changing for numbers up to one billion

2 Answers2

Analysis:

Memory:

CPU:

Linked