Fastest method for adding/summing the individual digit components of a number

Question

I saw a question on a math forum a while back where a person was discussing adding up the digits in a number over and over again until a single digit is achieved. (i.e. "362" would become "3+6+2" which would become "11"... then "11" would become "1+1" would would become "2" therefor "362" would return 2... I wrote some nice code to get an answer to this and posted it only to be outdone by a user who suggested that any number in modulo 9 is equal to this "infinite digit sum", I checked it an he was right... well almost right, if zero was returned you had to switch it out with a "9" but that was a very quick fix...

362 = 3+6+2 = 11 = 1+1 = 2

or...

362%9 = 2

Anways, the mod9 method works fantastic for infinitely adding the sum of the digits until you are left with just a single digit... but what about only doing it once (i.e. 362 would just return "11")... Can anyone think of fast algorithms?

Out of curiosity - Was an explanation provided as to why `%9` works? — Bernhard Barker, Feb 19 '13 at 05:28
@Dukeling No, I'd assume it has something to do with the fact that we are in a base 10 number system and that's the highest single digit numeral before the `place` (e.g. tens place, hundreds place, etc.) resets... In a base 6 number system I would assume this would work with modulo 5. — Albert Renshaw, Feb 19 '13 at 05:31
Probably a simple iteration - keep taking the quotient and remainder dividing by ten, summing the remainders and taking the quotient as the value to check in the next iteration, and stop when the quotient is zero. — , Feb 19 '13 at 05:31
Dividing by 10 and summing the remainder is what I am currently doing, I am wondering if anyone knows cool math tricks for it though :o) — Albert Renshaw, Feb 19 '13 at 05:32
@Dukeling: 10^k mod 9 == 1 for any integer k. So for integers d1, d2, ... and e1, e2, ..., we have d1*10^e1 + d2*10^e2 + ... mod 9 = d1*10^e1 mod 9 + d2*10*e2 mod 9 ... = d1*1 mod 9 + d2*1 mod 9 + ... = d1 + d2 + ... mod 9. Here, d1... are the digits of the number, and e1... are the exponents of 10 corresponding to each digit position. There is nothing special about 10 and 9; it will work for any base b and corresponding modulus b-1. — rici, Feb 19 '13 at 05:33
@AlbertRenshaw: You are right with your guess. More rigorously, a number `N` in base `b` (b > 2) is simply `Sum[exp=0..inf](digit*b^exp)`, we know `b^exp mod (b - 1) = 1`, therefore `N mod (b - 1) = Sum[exp=0..inf](digit) mod (b - 1)`. — nhahtdh, Feb 19 '13 at 05:34
@Dukeling The modulo 9 thing makes since when you are working with relatively low numbers... like 8 sums to 8, 9 sums to 9, 10 sums to 1, 11 sums to 2, so you can see a pattern emerge that the sum just keeps looping 1-9, 1-9, 1-9... but with higher numbers that's when I get confused as to why it works... when there are more iterations, like 365 = 11 = 2. — Albert Renshaw, Feb 19 '13 at 05:34
No, it's not about patterns. Take an example and see. Now, 365 = 3*10^2 + 6*10^1 + 5*10^0 ... now as @Dukeling mentioned, 10^y mod 9 = 1. So even if you multiply this 1 by any other number, you will still get that number. i.ie 365 = 300 + 60 + 5 and you already know what modulo of each returns. The same will apply to any number N in a base system of b. This also answers why when you get a zero, you need to replace by 9 (9 mod 9 = 0) — Ricketyship, Feb 19 '13 at 05:53
This `repeated digital sum` is also called the [`digital root`](http://en.wikipedia.org/wiki/Digital_root). The "mod 9" trick is just the congruence formula. — Blastfurnace, Feb 19 '13 at 06:10
This a known divisibility criteria by 9: http://en.wikipedia.org/wiki/Divisibility_rule — Bogdan, Feb 19 '13 at 06:12
Does it have to be mathematical or could we do this: `var a=0; b="362".split(''); for (i in b) {a += Number(b[i])};` — גלעד ברקן, Feb 19 '13 at 06:16
@groovy That is cool! I didn't think of making an array! and no, it doesn't have to be mathematical. — Albert Renshaw, Feb 19 '13 at 06:18

score 7 · Accepted Answer · 2013-02-21T03:17:27.343

There's a cool trick for summing the 1 digits in binary, and with a fixed-width integer. At each iteration, you separate out half the digits each into two values, bit-shift one value down, then add. First iteration, separate ever other digit. Second iteration, pairs of digits, and so on.

Given that 27 is 00011011 as 8-bit binary, the process is...

00010001 + 00000101 = 00010110  <-  every other digit step
00010010 + 00000001 = 00010011  <-  pairs of digits
00000011 + 00000001 = 00000100  <-  quads, giving final result 4

You could do a similar trick with decimal, but it would be less efficient than a simple loop unless you had a direct representation of decimal numbers with fast operations to zero out selected digits and to do digit-shifting. So for 12345678 you get...

02040608 + 01030507 = 03071115  <-  every other digit
00070015 + 00030011 = 00100026  <-  pairs
00000026 + 00000010 = 00000036  <-  quads, final result

So 1+2+3+4+5+6+7+8 = 36, which is correct, but you can only do this efficiently if your number representation is fixed-width decimal. It always takes lg(n) iterations, where lg means the base two logarithm, and you round upwards.

To expand on this a little (based on in-comments discussions), let's pretend this was sane, for a bit...

If you count single-digit additions, there's actually more work than a simple loop here. The idea, as with the bitwise trick for counting bits, is to re-order those additions (using associativity) and then to compute as many as possible in parallel, using a single full-width addition to implement two half-width additions, four quarter-width additions etc. There's significant overhead for the digit-clearing and digit-shifting operations, and even more if you implement this as a loop (calculating or looking up the digit-masking and shift-distance values for each step). The "loop" should probably be fully unrolled and those masks and shift-distances be included as constants in the code to avoid that.

A processor with support for Binary Coded Decimal (BCD) could handle this. Digit masking and digit shifting would be implemented using bit masking and bit shifting, as each decimal digit would be encoded in 4 (or more) bits, independent of the encoding of other digits.

One issue is that BCD support is quite rare these days. It used to be fairly common in the 8 bit and 16 bit days, but as far as I'm aware, processors that still support it now do so mainly for backward compatibility. Reasons include...

Very early processors didn't include hardware multiplication and division. Hardware support for these operations means it's easier and more efficient to convert binary to decimal now. Binary is used for almost everything now, and BCD is mostly forgotten.
There are decimal number representations around in libraries, but few if any high level languages ever provided portable support to hardware BCD, so since assembler stopped being a real-world option for most developers BCD support simply stopped being used.
As numbers get larger, even packed BCD is quite inefficiently packed. Number representations base 10^x have the most important properties of base 10, and are easily decoded as decimal. Base 1000 only needs 10 bits per three digits, not 12, because 2^10 is 1024. That's enough to show you get an extra decimal digit for 32 bits - 9 digits instead of 8 - and you've still got 2 bits left over, e.g. for a sign bit.

The thing is, for this digit-totalling algorithm to be worthwhile at all, you need to be working with fixed-width decimal of probably at least 32 bits (8 digits). That gives 12 operations (6 masks, 3 shifts, 3 additions) rather than 15 additions for the (fully unrolled) simple loop. That's a borderline gain, though - and other issues in the code could easily mean it's actually slower.

The efficiency gain is clearer at 64 bits (16 decimal digits) as there's still only 16 operations (8 masks, 4 shifts, 4 additions) rather than 31, but the odds of finding a processor that supports 64-bit BCD operations seems slim. And even if you did, how often do you need this anyway? It seems unlikely that it could be worth the effort and loss of portability.

`1+2+3+4+5+6+7+8` -> 7 addition operations. Yours -> 8 addition operations. Just saying... — Bernhard Barker, Feb 19 '13 at 06:00
@Dukeling - as usual, an asymptotic gain is only really a gain when your problem is sufficiently large. If your fixed-width numbers have a large enough width, you'll gain from it being O(log n) rather than O(n). But I screwed up working those examples out the first time manually as it is. If you know all your numbers will be sufficiently small, the fastest solution is a lookup-table of pre-computed results. You could even use that to optimise this by working base 100 or base 10000 at the start - bypassing a couple of iterations, in effect. — , Feb 19 '13 at 06:03
@Dukeling - that said, my example only has three *addition* operations for 8 digits - just saying ;-) — , Feb 19 '13 at 06:09
* single digit addition operations. Admittedly not really the most significant (but once we go beyond the size of the largest integer primitive in high-level languages, it does matter). In high-level languages you'll probably struggle to get more efficient than just adding the individual digits. — Bernhard Barker, Feb 19 '13 at 06:27
@Dukeling - OK, you're right there - it would only make sense with a fixed-width *hardware* decimal representation, along the lines of those binary-coded decimals that used to be supported by processors. I doubt they bother these days except maybe for back-compatibility though. And even then, you need to account for the other extra operations (masking and shifting) so there's really no practical value to this at all - sixteen-plus decimal digit hardware BCD is going to be *very* hard to find. — , Feb 19 '13 at 06:33
@Dukeling - actually, thinking about it, the same number of additions done by the simple iteration must always be done here - along with other additions and other operations - just using associativity of addition to re-order them. The point of the bitwise trick is to run several additions in parallel - e.g. using an 8-bit addition to give the effect of four two-bit additions or two four-bit additions. — , Feb 19 '13 at 06:55

גלעד ברקן · Answer 2 · 2013-02-19T16:08:44.793

0

Here's something in Haskell:

sumDigits n = 
  if n == 0 
     then 0
     else let a = mod n 10
          in a + sumDigits (div n 10)

Oh, but I just read you're doing that already...

(then there's also the obvious:

sumDigits n = sum $ map (read . (:[])) . show $ n

)

edited Feb 19 '13 at 16:08

answered Feb 19 '13 at 05:35

גלעד ברקן

23,602
3
25
61

score 0 · Answer 3 · answered Apr 04 '13 at 04:07

0

For short code, try this:

int digit_sum(int n){
    if (n<10) return n;
    return n%10 + digit_sum(n/10);
}

Or, in words,

-If the number is less than ten, then the digit sum is the number itself.

-Otherwise, the digit sum is the current last digit (a.k.a. n mod10 or n%10), plus the digit sum of everything to the left of that number (n divided by 10, using integer division).

-This algorithm can also be generalized for any base, substituting the base in for 10.

answered Apr 04 '13 at 04:07

user2016670

19
5

Since the question asked for the fastest method, not just a method that works, can you please provide evidence that this is the fastest method? – Chai T. Rex Jan 08 '23 at 22:51

nwellnhof · Answer 4 · 2023-03-13T04:21:42.583

The digit sum can also be computed as

n - 9 * (n / 10 + n / 100 + n / 1000 + ...)

leading to the following algorithm:

unsigned digit_sum(unsigned n) {
    unsigned k = n;
    unsigned s = 0;

    while (k >= 10) {
        k /= 10;
        s += k;
    }

    return n - s * 9;
}

This uses one less modulo operation compared to the standard approach. It can also be vectorized more easily. Here's an AVX example that computes the digit sum of a 32-bit integer:

uint32_t digit_sum(uint32_t n) {
    uint32_t lo = n % 100000;
    uint32_t hi = n / 100000;
    __m256 v = _mm256_set1_ps(lo);
    __m256 t = _mm256_set1_ps(hi);
    __m256 d = _mm256_set_ps(1e-1, 1e-2, 1e-3, 1e-4,
                             1e-1, 1e-2, 1e-3, 1e-4);

    v = _mm256_permute2f128_ps(v, t, 3);
    v = _mm256_mul_ps(v, d);
    v = _mm256_floor_ps(v);

    t = _mm256_permute2f128_ps(v, v, 1);
    v = _mm256_add_ps(v, t);
    v = _mm256_hadd_ps(v, v);
    v = _mm256_hadd_ps(v, v);

    return lo + hi - 9 * (uint32_t) _mm256_cvtss_f32(v);
}

score -1 · Answer 5 · edited Apr 08 '13 at 04:57

-1

int digit_sum(int n)
Do
    if (n<10) return n;
    Exit do
    else

    n=n%10 + digit_sum(n/10);

Loop

edited Apr 08 '13 at 04:57

HemChe

2,249
1
21
33

answered Apr 08 '13 at 04:35

Arun

1

Since the question asked for the fastest method, not just a method that works, can you please provide evidence that this is the fastest method? – Chai T. Rex Jan 08 '23 at 22:51

Fastest method for adding/summing the individual digit components of a number

5 Answers5

Related