(NOTE: Updated and edited for clarity. Complexity Analysis added at the end.)
OK, here is my solution, including my fixes to the performance issues found by @PeterdeRivaz. I have tested this against all of the test cases provided in the question and the comments and it finishes all in under a second (well, 1.5s in one case), using primarily only the memory for the partial results cache (I'd guess about 16MB).
Rather than using the traditional DP solution (which is both too slow and requires too much memory), I use a Depth-First, Greedy-First combinatorial search with pruning using current best results. I was surprised (very) that this works as well as it does, but I still suspect that you could construct test sets that would take a worst-case exponential amount of time.
First there is a master function that is the only thing that calling code needs to call. It handles all of the setup and initialization and calls everything else. (all code is C#)
// Find the min# of coins for a specified sum
int CountChange(int targetSum, int[] coins)
{
// init the cache for (partial) memoization
PrevResultCache = new PartialResult[1048576];
// make sure the coins are sorted lowest to highest
Array.Sort(coins);
int curBest = targetSum;
int result = CountChange_r(targetSum, coins, coins.GetLength(0)-1, 0, ref curBest);
return result;
}
Because of the problem test-cases raised by @PeterdeRivaz I have also added a partial results cache to handle when there are large numbers in N[] that are close together.
Here is the code for the cache:
// implement a very simple cache for previous results of remainder counts
struct PartialResult
{
public int PartialSum;
public int CoinVal;
public int RemainingCount;
}
PartialResult[] PrevResultCache;
// checks the partial count cache for already calculated results
int PrevAddlCount(int currSum, int currCoinVal)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// use it, as long as it's actually the same partial sum
// and the coin value is at least as large as the current coin
if ((prev.PartialSum == currSum) && (prev.CoinVal >= currCoinVal))
{
return prev.RemainingCount;
}
// otherwise flag as empty
return 0;
}
// add or overwrite a new value to the cache
void AddPartialCount(int currSum, int currCoinVal, int remainingCount)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// only add if the Sum is different or the result is better
if ((prev.PartialSum != currSum)
|| (prev.CoinVal <= currCoinVal)
|| (prev.RemainingCount == 0)
|| (prev.RemainingCount >= remainingCount)
)
{
prev.PartialSum = currSum;
prev.CoinVal = currCoinVal;
prev.RemainingCount = remainingCount;
PrevResultCache[cacheAddr] = prev;
}
}
And here is the code for the recursive function that does the actual counting:
/*
* Find the minimum number of coins required totaling to a specifuc sum
* using a list of coin denominations passed.
*
* Memory Requirements: O(N) where N is the number of coin denominations
* (primarily for the stack)
*
* CPU requirements: O(Sqrt(S)*N) where S is the target Sum
* (Average, estimated. This is very hard to figure out.)
*/
int CountChange_r(int targetSum, int[] coins, int coinIdx, int curCount, ref int curBest)
{
int coinVal = coins[coinIdx];
int newCount = 0;
// check to see if we are at the end of the search tree (curIdx=0, coinVal=1)
// or we have reached the targetSum
if ((coinVal == 1) || (targetSum == 0))
{
// just use math get the final total for this path/combination
newCount = curCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// prune this whole branch, if it cannot possibly improve the curBest
int bestPossible = curCount + (targetSum / coinVal);
if (bestPossible >= curBest)
return bestPossible; //NOTE: this is a false answer, but it shouldnt matter
// because we should never use it.
// check the cache to see if a remainder-count for this partial sum
// already exists (and used coins at least as large as ours)
int prevRemCount = PrevAddlCount(targetSum, coinVal);
if (prevRemCount > 0)
{
// it exists, so use it
newCount = prevRemCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// always try the largest remaining coin first, starting with the
// maximum possible number of that coin (greedy-first searching)
newCount = curCount + targetSum;
for (int cnt = targetSum / coinVal; cnt >= 0; cnt--)
{
int tmpCount = CountChange_r(targetSum - (cnt * coinVal), coins, coinIdx - 1, curCount + cnt, ref curBest);
if (tmpCount < newCount) newCount = tmpCount;
}
// Add our new partial result to the cache
AddPartialCount(targetSum, coinVal, newCount - curCount);
return newCount;
}
Analysis:
Memory:
Memory usage is pretty easy to determine for this algorithm. Basiclly there's only the partial results cache and the stack. The cache is fixed at appx. 1 million entries times the size of each entry (3*4 bytes), so about 12MB. The stack is limited to
O(N)
, so together, memory is clearly not a problem.
CPU:
The run-time complexity of this algorithm starts out hard to determine and then gets harder, so please excuse me because there's a lot of hand-waving here. I tried to search for an analysis of just the brute-force problem (combinatorial search of sums of N*kn base values summing to S) but not much turned up. What little there was tended to say it was
O(N^S)
, which is clearly too high. I think that a fairer estimate is
O(N^(S/N))
or possibly
O(N^(S/AVG(N))
or even
O(N^(S/(Gmean(N)))
where
Gmean(N)
is the geometric mean of the elements of N[]. This solution starts out with the brute-force combinatorial search and then improves it with two significant optimizations.
The first is the pruning of branches based on estimates of the best possible results for that branch versus what the best result it has already found. If the best-case estimators were perfectly accurate and the work for branches was perfectly distributed, this would mean that if we find a result that is better than 90% of the other possible cases, then pruning would effectively eliminate 90% of the work from that point on. To make a long story short here, this should work out that the amount of work still remaining after pruning should shrink harmonically as it progress. Assuming that some kind of summing/integration should be applied to get a work total, this appears to me to work out to a logarithm of the original work. So let's call it O(Log(N^(S/N))
, or O(N*Log(S/N))
which is pretty darn good. (Though O(N*Log(S/Gmean(N)))
is probably more accurate).
However, there are two obvious holes with this. First, it is true that the best-case estimators are not perfectly accurate and thus they will not prune as effectively as assumed above, but, this is somewhat counter-balanced by the Greedy-First ordering of the branches which gives the best chances for finding better solutions early in the search which increase the effectiveness of pruning.
The second problem is that the best-case estimator works better when the different values of N are far apart. Specifically, if |(S/n2 - S/n1)| > 1
for any 2 values in N, then it becomes almost perfectly effective. For values of N less than SQRT(S), then even two adjacent values (k, k+1) are far enough apart that that this rule applies. However for increasing values above SQRT(S) a window opens up so that any number of N-values within that window will not be able to effectively prune each other. The size of this window is approximately K/SQRT(S). So if S=10^9, when K is around 10^6 this window will be almost 30 numbers wide. This means that N[] could contain 1 plus every number from 1000001 to 1000029 and the pruning optimization would provide almost no benefit.
To address this, I added the partial results cache which allows memoization of the most recent partial sums up to the target S. This takes advantage of the fact that when the N-values are close together, they will tend to have an extremely high number of duplicates in their sums. As best as I can figure, this effectiveness is approximately the N times the J-th root of the problem size where J = S/K
and K is some measure of the average size of the N-values (Gmean(N) is probably the best estimate). If we apply this to the brute-force combinatorial search, assuming that pruning is ineffective, we get O((N^(S/Gmean(N)))^(1/Gmean(N)))
, which I think is also O(N^(S/(Gmean(N)^2)))
.
So, at this point take your pick. I know this is really sketchy, and even if it is correct, it is still very sensitive to the distribution of the N-values, so lots of variance.