count integers between large A and B having count of digit S

Question

I am trying to count Integers between Range A to B having digit sum S (assuming S=60) .

A and B ranges from 1 to 10^18.

let X be a number and Upto Y we have to Count Integers.

X = x1 x2 ... xn - 1 xn and Y = y1 y2 ... yn - 1 yn, where xi and yi are the decimal digits of X and Y.

leftmost_lo as the smallest i with xi < yi. We define leftmost_lo as n + 1 if there is no such i. Analogously, we define leftmost_hi as the as the smallest i with xi > yi, or n + 1 otherwise.

Function count is returning the number f(Y) of integers X with the property X ≤ Y and X has the digit sum 60.

Let n be the number of Y's digits and y[i] be the i-th decimal digit of Y according to the definition above. The following recursive algorithm solves the problem:

    count(i, sum_so_far, leftmost_lo, leftmost_hi):
       if i == n + 1:
       # base case of the recursion, we have recursed beyond the last digit
       # now we check whether the number X we built is a valid solution
        if sum_so_far == 60 and leftmost_lo <= leftmost_hi:
          return 1
        else: 
          return 0
     result = 0
     # we need to decide which digit to use for x[i]
     for d := 0 to 9
        leftmost_lo' = leftmost_lo
        leftmost_hi' = leftmost_hi
        if d < y[i] and i < leftmost_lo': leftmost_lo' = i
        if d > y[i] and i < leftmost_hi': leftmost_hi' = i
       result += count(i + 1, sum_so_far + d, leftmost_lo', leftmost_hi')
    return result






Compute the number f(Y) of integers X with the property X ≤ Y and X has the digit sum 60

Now we have f(Y) = count(1, 0, n + 1, n + 1) and we have solved the problem .The runtime

is O(n^4) for this particular implementation.

I understand this Concept From this link. How to count integers between large A and B with a certain property?

But not able to understand How to Optimize this.

Now How can I optimize this to O(n) solution for this particular problem.

Can any one please help me.

These types of counting problems are usually solved by applying mathematical and combinatorial formulas rather than actually evaluating each possible candidate. — Abhishek Bansal, Aug 30 '14 at 05:14
@user1990169 , yes ,you are can solve this problem by computing mathematical formula , but I am thinking about general approach to solve this problem . — Avneet, Aug 30 '14 at 06:46
Just FYI - I added an outline idea for arbitrary A and B, which could apply to either my or M Oehm's method. — גלעד ברקן, Aug 31 '14 at 14:05

score 2 · Answer 1 · answered Aug 31 '14 at 02:34

First, you can note that if you have a function F that returns the number of ints <= A with digit sum S, then the number of ints between A and B with digit sum S is F(B)-F(A-1).

Then defining some notation:

n(A) means the number consisting of all 9's with the same number of digits as A). For example, n(123) = 999.
A[0] means the left-most digit of A
A[1:] means A with the left-most digit removed.

You then have these relations, doing one digit at a time, and noting that the possibilities are either you match the first digit of A, or you put a lower digit there (and then for the recursive case you can replace A with all 9s).

F(S, A) = 1 if S = 0
F(S, A) = 0 if S < 0 or A = 0
otherwise F(S, A) =
    F(S-A[0], A[1:])
    + F(S-0, n(A[1:])) + F(S-1, n(A[1:])) + ... + F(S-A[0]-1, n(A[1:]))

This gives you this code (with a cache to avoid computing the same thing multiple times):

def count1(S, digits, nines, k, cache):
    if S <= 0 or k == len(digits): return S==0
    key = (S, nines, k)
    if key not in cache:
        dk = 9 if nines else digits[k]
        cache[key] = sum(count1(S-i, digits, nines or i<dk, k+1, cache)
                         for i in xrange(dk+1))
    return cache[key]

def count(S, A):
    return count1(S, map(int, str(A)), False, 0, {})

def count_between(S, A, B):
    return count(S, B) - count(S, A-1)

print count_between(88, 1, 10**10)

The cache ends up being at most size S * 2 * len(str(A)) and each thing is computed once, which gives you the complexity: O(S * log_10(A)).

גלעד ברקן · Answer 2 · 2014-09-03T00:49:51.857

For A=1 and B=10^18, generate all integer partitions of S that have less than 19 parts and where each part is less than 10. The answer is the sum of the number of distinct permutations of each partition as digits combined with (18 - number_of_parts) zeros.

For other A's and B's, there's slightly more math involved around the edges :)

For the range 1 to an arbitrary B, we can use a similar process, although with more enumerations:

Let's say B has digits b1 b2 ... bn - 1 bn. We decrement b1 and enumerate the partitions (with less than n parts, each part under 10) for the number S - (b1 - 1), and the cardinality of their distinct permutations when combined with (n - 1 - number_of_parts) zeros. We repeat this process up to and including b1 = 0 (here the maximum number of parts and leading zeros would be decremented by one). We then repeat a similar process for b2, but this time S is first decreased by b1. And so on, summing the results.

For an arbitrary A and B, we subtract f(A) from f(B).

JavaScript code:

function choose(n,k){
  if (k == 0 || n == k){
    return 1;
  }
  var product = n;
  for (var i=2; i<=k; i++){
    product *= (n + 1 - i) / i
  }
  return product;
}

function digits(n){
  var ds = [];
  while (n){
    ds.push(n % 10);
    n = Math.floor(n/10);
  }
  return ds.reverse()
}

function ps(n,maxParts){
  if (maxParts <= 0){
    return 0;
  }
  var result = 0;
  for (var i=9; i>=Math.floor(n/maxParts); i--){
    var r = [0,0,0,0,0,0,0,0,0,0,1]; // 11th place is number of parts
    r[i]++;
    result += _ps(n-i,r,i,1,maxParts);
  }
  return result;
}

function _ps(n,r,i,c,maxParts){
  if (n==0){
    return numPs(r,maxParts);
  } else if (c==maxParts || n<0){
    return 0;
  } else{
    var result = 0;
    for (var j=i; j>0; j--){
      var r0 = r.slice();
      r0[j]++;
      r0[10]++;
      result += _ps(n-j,r0,j,c+1,maxParts);
    }
    return result;
  }
}

function numPs(partition,n){
  var l = choose(n,n - partition[10]);
  n = partition[10];
  for (var i=0; i<10;i++){
    if (partition[i] != 0){
      l *= choose(n,partition[i]);
      n -= partition[i];
    }
  }
  return l;
}

function f(n,s){
  var ds = digits(n),
      n = ds.length,
      answer = 0;
  for (var i=0; i<n - 1; i++){
    if (ds[i] != 0){
      var d = ds[i] - 1;
      while (d >= 0){
        answer += ps(s - d,n - i - 1);
        d--;
      }
      s -= ds[i];
    }
  }
  if (s <= ds[n - 1]){
    answer++;
  }

  return answer;
}

Output:

console.log(f(1,1));
1

console.log(f(1000,3));
10

console.log(f(1001,3));
10

console.log(f(1002,3));
11

console.log(f(1003,3));
11

console.log(f(1010,3));
11

thanks buddy for your reply but i didn't understand your approach properly , what _ps , numPS , ps doing , didn't get that.Can you please explain it little bit more. — Avneet, Sep 24 '14 at 12:05
@Assians `_ps` is a recursive function initiated by `ps` - together they generate partitions for `s`, who's number of parts is bound by `maxParts`, and then return the cumulative number of distinct permutations of these partitions, calculated by the function `numPS`. For example if `s` were 3 and `n` 1000, the partitions would be `[1,1,1]` (1 permutation), `[1,2,0]` (6 permutations), `[3,0,0]` (3 permutations). Total is 10, just like the example in my answer. Make sense? — גלעד ברקן, Sep 24 '14 at 18:19

M Oehm · Accepted Answer · 2014-09-01T08:30:03.453

Edit Oh dear! Just after I have acknowledged that my answer misses the point, it got accepted. I've left the original answer as it is and explain the idea behind my algorithm and how it is inferior to the original solution after the fold.

This particular problem should be treated as "all integers with 18 digits" rather than "all integers between 1 and 10^18". (For the digit sum, numbers with fewer than 18 digits can be treated as 18-digit numbers with leading zeros.)

Then you can use an algorithm that spreads from the bottom up, just like the sieve of Erathostenes spreads across all non-primes.

Start with an array of digit counts dig for the digits 1 to 9 representing 0, i.e. all zeros. (The number of zeros can be calculated as 18 - sum(dig). Then you can recurse like this:

recurse(dig[], pos) {
    if (digitsum(dig) > 60) return;

    if (digitsum(dig) == 60) {
        count += poss(dig)
    } else {
        if (pos < 9) recurse(dig, pos + 1);
        if (sum(dig) < 18) {
            dig[pos]++;
            recurse(dig, pos);
            dig[pos]--;
        }  
    }
}

That way you treat all combinations of digit counts and return if you exceed 60. When you hit 60 exactly, you must calculate the number of possible numbers that correspond to that digit count:

poss(dig) = 18! / prod(dig[i]!)

The product of the factorials prod(dig[i]!)must include the factorial of zeros. (And, of course, 0! == 1.)

That approach should be fast enough if you keep track of the sum so far and precalculate the factorials. It doesn't work if you want to calculate all numbers with a digit sum of 60 between 50 and 5,000,000,000, say.

Addendum The framework that you have linked to can treat any range from A to B. Here, let's focus on ranges from 0 to 10^n, that is n-digit numbers where numbers with fewer digits are considered to have leading zeros.

The idea of my algorithm is not to enumerate all numbers, but to consider counts of digits. For example if my number has 5x the digit 9 and 3x the digit 5, the digit sum is 60. Now we have to find how many numbers of 18 digits satisfy that condition. 590,050,005,090,900,099 is one such number, but all unique permutations of this number are valid, too. The number has 18 - (5 + 3) = 10 zeros, hence this combination has

N(5x9, 3x5) = 18! / (5! * 3! * 10!)

permutations.

The algorithms must enumerate all permutations. It keeps track of the enumerations in an array, dig:

     ^ count
     |
2x   ...  ...  ...  ...  ...  ...  ...  ...  ... 

1x   ...  ...  ...  ...  ...  ...  ...  ...  ...  

0x   ...  ...  ...  ...  ...  ...  ...  ...  ...

     ---------------------------------------------> pos
      1    2    3    4    5    6    7    8    9

The case above would be

dig == [0, 0, 0, 0, 3, 0, 0, 0, 5]

To achieve this, it spreads in a zig-zag pattern. The current digit is called pos. It can either move vertically by incrementing the count of the current digit by one or it can move horizontally by considering the next digit. Recursion stops if the digit sum hits or exceed S or if pos goes beyond 9. Every time we hit S, we do our permutation calculation as shown above.

Because the array is passed by reference and is virtually the same array throughout we must decrement it after incrementing: We must clean up after tracking back.

This algorithm works and will find the answer to the number of all 18-digit numbers whose digit sum is 60 in the fraction of a second.

But it doesn't scale, because its running time grows exponentially. And also because you can calculate factorials for 18! with 64-bit integers, but after 20! you'll need big-integer arithmetic. (A clever implementation will be able to get further by simplifying the fraction N! / prod(dig[i]!), however.)

Now consider the code you posted. I've removed everything that calculates the ranges. The bare-bones version is:

ds_count(i, sum)
{
    if (sum > 60) return 0;

    if (i == 18) {
        if (sum == 60) return 1;
        return 0;
    }

    result = 0;
    for (d = 0; d < 10; d++) {
        result += ds_count(i + 1, sum + d);
    }

    return result;
}

This enumerates all 18-digit values. It stops short when the sum exceeds 60, but that's about it. This is not much better than a brute-force solution.

But this solution lends itself to memoizing. It will frequently be called with the same values, and it's easy to see why. For example the call ds_count(2, 5) will be called from 05..., 14..., 23..., 32..., 41... and 50.... (That reminds me a bit of the differently sized number chips in the board game Settlers of Catan, which account for the sum of two dice rolls.)

If you can determine one of these values, you can save it and effectively cut 5 calls to a tail of 16 digits short. So:

ds_count(i, sum)
{
    if (sum > 60) return 0;

    if (i == 18) {
        if (sum == 60) return 1;
        return 0;
    }

    if (defined(memo[i, sum])) return memo[i, sum];

    result = 0;
    for (d = 0; d < 10; d++) {
        result += ds_count(i + 1, sum + d);
    }

    memo[i, sum] = result;
    return result;
}

This is very fast and has no hard limitation like the factorial solution. It's also much easier to impelment, because its a recursive enumeration at heart.

It is interesting to note that my solution isn't suited to memoizing. (Except for memoizing the factorials, but that isn't the limiting factor here.) The point of the zig-zag count-set generation is to make only unique recursive calls. There's also a state, namely the digit set, which makes memoizing harder.

This is actually much more inefficient than the solution OP already has... From what I see this has a runtime exponential in n, rather than polynomial — Niklas B., Aug 30 '14 at 20:19
By the way, the problem asks for the count of numbers < X with that property, for a given X. Your approach only works if X = 10^k — Niklas B., Aug 30 '14 at 22:09
@NiklasB.: Touché. I failed to see that the solution uses memoisation, which is mentioned only in your (long) answer to your own question the OP linked to. As to your second comment: This particular question asks for the range 1 to 10^18. I thought applying your more general solution to this special case would be wasteful. (I have acknowledged this restriction of my algorithm in the post, though.) Good thing I don't make my living in the digitsum-counting business. — M Oehm, Aug 31 '14 at 05:45
@NiklasB.: I also haven't answered the question, which was not to find the solution to A=1, B=10^18, S=60, but to improve the algorithm. (Because I hadn't understood the algorithm in the first place. It is interesting how a brute-force solution that in its core visits all numbers in the range can be otimised when it lends itself to meoisation. So at least _I_ have learnt someting from the question, if something everone else seems to have known all along.) — M Oehm, Aug 31 '14 at 06:02
I'll delete this answer later. (Deleting right away without addressing the points raised in comments feels a bit like sneaking out the back door to me.) — M Oehm, Aug 31 '14 at 06:07
@M Oehm , thanks for your reply .sir , I can't understand your approach properly. What pos is denoting and why dig[pos]++ , dig[[pos]-- , can u please explain it more sir. — Avneet, Aug 31 '14 at 08:50
@Assians: I've updated my answer and explained my solution in more detail, including how it is less efficient than the original with memoizing. If you still had questions, you shouldn't have accepted my answer. Also, no "sir", please, just "you" is okay. — M Oehm, Aug 31 '14 at 10:59
@M Oehm thanks, I understand your both approaches. But I want to ask that why you only increasing dig[pos]++ , when sum(dig)<18 and why In first approach It doesn't work if I want to calculate all numbers with a digit sum of 60 between 50 and 5,000,000,000 as you mention . — Avneet, Aug 31 '14 at 22:08
Aha. That's an error in the pseudocode: `sum(dig) < 18` is the sum of the count of digits, and `sum(dig) == 60` should be `digitsum(dig)`, i.e. `sum(i*dig[i])`. I've fixed that. — M Oehm, Sep 01 '14 at 08:29
Since we presented similar ideas, I thought you might appreciate an extension of the same idea to arbitrary ranges (if it is of interest, please see my answer). — גלעד ברקן, Sep 03 '14 at 01:30

count integers between large A and B having count of digit S

3 Answers3