106

If I want to find the sum of the digits of a number, i.e.:

  • Input: 932
  • Output: 14, which is (9 + 3 + 2)

What is the fastest way of doing this?

I instinctively did:

sum(int(digit) for digit in str(number))

and I found this online:

sum(map(int, str(number)))

Which is best to use for speed, and are there any other methods which are even faster?

Georgy
  • 12,464
  • 7
  • 65
  • 73
SpFW
  • 1,099
  • 2
  • 8
  • 5

11 Answers11

135

Both lines you posted are fine, but you can do it purely in integers, and it will be the most efficient:

def sum_digits(n):
    s = 0
    while n:
        s += n % 10
        n //= 10
    return s

or with divmod:

def sum_digits2(n):
    s = 0
    while n:
        n, remainder = divmod(n, 10)
        s += remainder
    return s

Slightly faster is using a single assignment statement:

def sum_digits3(n):
   r = 0
   while n:
       r, n = r + n % 10, n // 10
   return r

> %timeit sum_digits(n)
1000000 loops, best of 3: 574 ns per loop

> %timeit sum_digits2(n)
1000000 loops, best of 3: 716 ns per loop

> %timeit sum_digits3(n)
1000000 loops, best of 3: 479 ns per loop

> %timeit sum(map(int, str(n)))
1000000 loops, best of 3: 1.42 us per loop

> %timeit sum([int(digit) for digit in str(n)])
100000 loops, best of 3: 1.52 us per loop

> %timeit sum(int(digit) for digit in str(n))
100000 loops, best of 3: 2.04 us per loop
wim
  • 338,267
  • 99
  • 616
  • 750
Pavel Anossov
  • 60,842
  • 14
  • 151
  • 124
  • May I need an explanation here on "while n:"? I do not know how Python understand when to stop. For example, sum of digits of 324 should be 3+2+4. For the last (the front digit three), in the while loop, 3/10=0 and then it becomes "while 0:". So, does while 0 means False to the loop and then escape the loop and return s? – nam Aug 18 '14 at 04:11
  • 3
    Yep, some things are equivalent to False in places that expect boolean values. See here: https://docs.python.org/2/library/stdtypes.html#truth-value-testing – Pavel Anossov Aug 18 '14 at 11:37
  • 1
    Is there way to find the sum of digits of odd sequence of integers by a formula? – Anoop Toffy Apr 09 '16 at 14:37
  • 6
    What's the value of `n` in your %timeit calls? – d8aninja Sep 27 '16 at 21:35
  • 1
    I think it's not the absence of augmented assignment which makes the third code faster, it's the use of *multiple assignment*. – kotchwane May 01 '21 at 20:21
  • 1
    I think you should mention that this is only more efficient for small integers. Use larger integers and the string conversion will eventually start to win - creating bigints for each of those divisions get very expensive. – wim Mar 22 '22 at 01:29
  • @Anoop : just use high-speed regex to delete every 2nd position (or set them to a zero) (either 1 call to gsub( ) or 2, depending on whether the particular regex engine has backreferences, which might actually be slower), then simply sum it up – RARE Kpop Manifesto Mar 22 '22 at 15:46
  • @wim on my machine, the breakeven point appears to be somewhere around only 20 decimal digits. (One might think that value is suspicious, and that hitting the threshold where a second 64-bit value is required in the underlying representation, makes a fair difference. However, I'm not seeing a significant discontinuity in timing results from that.) – Karl Knechtel Oct 09 '22 at 16:31
  • @KarlKnechtel Around 27 digits for me, and the [graphs look continuous](https://stackoverflow.com/a/71566286/674039). Which code where you using for the string domain approach? – wim Oct 09 '22 at 19:07
  • I was using the `sum(map(int` approach as in the OP. – Karl Knechtel Oct 09 '22 at 19:22
  • @KarlKnechtel Interesting, I had overlooked `sum(map(int, str(n)))` but it is actually the fastest for moderately small numbers (math wins for small numbers, the str.count trick wins for big numbers, but there's a window in between where map is best). I've updated my answer to include this approach in the graph, thanks for bringing it to my attention. – wim Oct 09 '22 at 21:01
17

If you want to keep summing the digits until you get a single-digit number (one of my favorite characteristics of numbers divisible by 9) you can do:

def digital_root(n):
    x = sum(int(digit) for digit in str(n))
    if x < 10:
        return x
    else:
        return digital_root(x)

Which actually turns out to be pretty fast itself...

%timeit digital_root(12312658419614961365)

10000 loops, best of 3: 22.6 µs per loop
d8aninja
  • 3,233
  • 4
  • 36
  • 60
8

This might help

def digit_sum(n):
    num_str = str(n)
    sum = 0
    for i in range(0, len(num_str)):
        sum += int(num_str[i])
    return sum
Aftab Lateef
  • 81
  • 1
  • 1
  • 1
    thanks, this one helped me in the problem: examine if the given number can give modulo 0 after you sum up its digits. – Yannis Dran Nov 25 '16 at 22:55
8

Found this on one of the problem solving challenge websites. Not mine, but it works.

num = 0            # replace 0 with whatever number you want to sum up
print(sum([int(k) for k in str(num)]))
Falko
  • 17,076
  • 13
  • 60
  • 105
Ash
  • 175
  • 1
  • 6
4

Doing some Codecademy challenges I resolved this like:

def digit_sum(n):
    digits = []
    nstr = str(n)
    for x in nstr:
        digits.append(int(x))
    return sum(digits)
Alon Gouldman
  • 3,025
  • 26
  • 29
1

Whether it's faster to work with math or strings here depends on the size of the input number.

For small numbers (fewer than 20 digits in length), use division and modulus:

def sum_digits_math(n):
    r = 0
    while n:
        r, n = r + n % 10, n // 10
    return r

For large numbers (greater than 30 digits in length), use the string domain:

def sum_digits_str_fast(n):
    d = str(n)
    return sum(int(s) * d.count(s) for s in "123456789")

There is also a narrow window for numbers between 20 and 30 digits in length where sum(map(int, str(n))) is fastest. That is the purple line in the graph shown below (click here to zoom in).

profile

The performance profile for using math scales poorly as the input number is bigger, but each approach working in string domain appears to scale linearly in the length of the input. The code that was used to generate these graphs is here, I'm using CPython 3.10.6 on macOS.

wim
  • 338,267
  • 99
  • 616
  • 750
0

Try this

    print(sum(list(map(int,input("Enter your number ")))))
0

Here is a solution without any loop or recursion but works for non-negative integers only (Python3):

def sum_digits(n):
    if n > 0:
        s = (n-1) // 9    
        return n-9*s
    return 0
0

A base 10 number can be expressed as a series of the form

a × 10^p + b × 10^p-1 .. z × 10^0

so the sum of a number's digits is the sum of the coefficients of the terms.

Based on this information, the sum of the digits can be computed like this:

import math

def add_digits(n):
    # Assume n >= 0, else we should take abs(n)
    if 0 <= n < 10:
        return n
    r = 0
    ndigits = int(math.log10(n))
    for p in range(ndigits, -1, -1):
        d, n = divmod(n, 10 ** p)
        r += d
    return r

This is effectively the reverse of the continuous division by 10 in the accepted answer. Given the extra computation in this function compared to the accepted answer, it's not surprising to find that this approach performs poorly in comparison: it's about 3.5 times slower, and about twice as slow as

sum(int(x) for x in str(n))
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
-1

Why is the highest rated answer 3.70x slower than this ?

% echo; ( time (nice echo 33785139853861968123689586196851968365819658395186596815968159826259681256852169852986 \
 | mawk2 'gsub(//,($_)($_)($_))+gsub(//,($_))+1' | pvE0 \
 | mawk2 '
   
   function __(_,___,____,_____) {

                  ____=gsub("[^1-9]+","",_)~""
                ___=10
       while((+____<--___) && _) {
            _____+=___*gsub(___,"",_) 
       }
       return _____+length(_) } 

    BEGIN { FS=OFS=ORS
                RS="^$" 
    } END { 
            print __($!_) }' )| pvE9 ) | gcat -n | lgp3 ;

      in0:  173MiB 0:00:00 [1.69GiB/s] [1.69GiB/s] [<=>                                            ]
     out9: 11.0 B 0:00:09 [1.15 B/s] [1.15 B/s] [<=>                                               ]
      in0:  484MiB 0:00:00 [2.29GiB/s] [2.29GiB/s] [  <=>                                          ]
( nice echo  | mawk2 'gsub(//,($_)($_)($_))+gsub(//,($_))+1' | pvE 0.1 in0 | )  

 8.52s user 1.10s system 100% cpu 9.576 total
 1  2822068024



% echo; ( time ( nice echo 33785139853861968123689586196851968365819658395186596815968159826259681256852169852986 \
     \
     | mawk2 'gsub(//,($_)($_)($_))+gsub(//,($_))+1' | pvE0 \
     |  gtr -d '\n' \
     \
     |  python3 -c 'import math, os, sys;

        [ print(sum(int(digit) for digit in str(ln)), \
                                            end="\n") \
         \
         for ln in sys.stdin ]' )| pvE9 ) | gcat -n | lgp3 ;


      in0:  484MiB 0:00:00 [ 958MiB/s] [ 958MiB/s] [     <=>                                       ]
     out9: 11.0 B 0:00:35 [ 317miB/s] [ 317miB/s] [<=>                                             ]
( nice echo  | mawk2 'gsub(//,($_)($_)($_))+gsub(//,($_))+1' | pvE 0.1 in0 | )  

 35.22s user 0.62s system 101% cpu 35.447 total
     1  2822068024

And that's being a bit generous already. On this large synthetically created test case of 2.82 GB, it's 19.2x slower.

 % echo; ( time ( pvE0 < testcases_more108.txt  |  mawk2 'function __(_,___,____,_____) { ____=gsub("[^1-9]+","",_)~"";___=10; while((+____<--___) && _) { _____+=___*gsub(___,"",_) }; return _____+length(_) } BEGIN { FS=RS="^$"; CONVFMT=OFMT="%.20g" } END { print __($_) }'  ) | pvE9 ) |gcat -n | ggXy3  | lgp3; 

      in0:  284MiB 0:00:00 [2.77GiB/s] [2.77GiB/s] [=>                             ]  9% ETA 0:00:00
     out9: 11.0 B 0:00:11 [1016miB/s] [1016miB/s] [<=>                                             ]
      in0: 2.82GiB 0:00:00 [2.93GiB/s] [2.93GiB/s] [=============================>] 100%            
( pvE 0.1 in0 < testcases_more108.txt | mawk2 ; )

  8.75s user 2.36s system 100% cpu 11.100 total
     1  3031397722

% echo; ( time ( pvE0 < testcases_more108.txt  | gtr -d '\n' |  python3 -c 'import sys; [ print(sum(int(_) for _ in str(__))) for __ in sys.stdin ]' ) | pvE9 ) |gcat -n | ggXy3  | lgp3;  


      in0: 2.82GiB 0:00:02 [1.03GiB/s] [1.03GiB/s] [=============================>] 100%            
     out9: 11.0 B 0:03:32 [53.0miB/s] [53.0miB/s] [<=>                                             ]
( pvE 0.1 in0 < testcases_more108.txt | gtr -d '\n' | python3 -c ; )  

  211.47s user 3.02s system 100% cpu 3:32.69 total
     1  3031397722

—————————————————————

UPDATE : native python3 code of that concept - even with my horrific python skills, i'm seeing a 4x speedup :

% echo; ( time ( pvE0 < testcases_more108.txt  \
\
 |python3 -c 'import re, sys;

  print(sum([ sum(int(_)*re.subn(_,"",__)[1] 

     for _ in [r"1",r"2", r"3",r"4",
               r"5",r"6",r"7",r"8",r"9"])

 for __ in sys.stdin ]))' |pvE9))|gcat -n| ggXy3|lgp3   

      in0: 1.88MiB 0:00:00 [18.4MiB/s] [18.4MiB/s] [>                              ]  0% ETA 0:00:00
     out9: 0.00 B 0:00:51 [0.00 B/s] [0.00 B/s] [<=>                                               ]
      in0: 2.82GiB 0:00:51 [56.6MiB/s] [56.6MiB/s] [=============================>] 100%            
     out9: 11.0 B 0:00:51 [ 219miB/s] [ 219miB/s] [<=>                                             ]

( pvE 0.1 in0 < testcases_more108.txt | python3 -c  | pvE 0.1 out9; ) 



48.07s user 3.57s system 100% cpu 51.278 total
 1  3031397722

Even the smaller test case managed a 1.42x speed up :

 echo; ( time (nice echo 33785139853861968123689586196851968365819658395186596815968159826259681256852169852986 \ 
 | mawk2 'gsub(//,($_)($_)$_)+gsub(//,$_)+1' ORS='' | pvE0 | python3 -c 'import re, sys; print(sum([ sum(int(_)*re.subn(_,"",__)[1] for _ in [r"1",r"2", r"3",r"4",r"5",r"6",r"7",r"8",r"9"]) for __ in sys.stdin ]))'  | pvE9  ))  |gcat -n | ggXy3 | lgp3 


      in0:  484MiB 0:00:00 [2.02GiB/s] [2.02GiB/s] [  <=>                                          ]
     out9: 11.0 B 0:00:24 [ 451miB/s] [ 451miB/s] [<=>                                             ]
( nice echo  | mawk2 'gsub(//,($_)($_)$_)+gsub(//,$_)+1' ORS='' | pvE 0.1 in0)

 20.04s user 5.10s system 100% cpu 24.988 total
   1    2822068024
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • 1
    Because you're comparing a domain-specific language designed for text processing with a high-level, general-purpose programming language. Apples and oranges. – wim Mar 22 '22 at 17:56
  • that's only because i'm bad at python and might not be able to code it optimally, but the concept is identical - why sum up digits 1 at a time when one can use python3 re.sub( ) to batch process each digit at high speed ? – RARE Kpop Manifesto Mar 22 '22 at 18:08
  • It sounds like an interesting idea, and I'd be interested to see how it compares. Maybe you can write a pseudocode version and someone can translate it to Python, but the awk version is incomprehensible to me at least – wim Mar 22 '22 at 18:10
  • I've updated with a python version of it - you really gotta excuse my horrific python code - i couldn't get the RE engine to loop integers so I had to manually code in an array of 9 of them – RARE Kpop Manifesto Mar 22 '22 at 18:39
  • it's faster not because i'm some sort of python guru. it works because this is one of those cases where "doing arithmetic" is detrimental when none is needed. One can expand that list with r"[Aa]" r"[Bb]" etc , and make it directly sum up hex digits too. You can also generialize that concept by cycling through the bytes - their # of occurrences multiplied by byte ordinal value, and can get a "summation" of byte values in a binary file (for what use case, i don't know, but it's easily generalizable) – RARE Kpop Manifesto Mar 22 '22 at 19:24
  • how is it slower ? You can directly replicate my test from the code I've pasted above. pvE0 or pvE9 are just customized versions of pv. "lgp3" is just to create line gaps. "ggXy3" is just to perform customized color grepping. I'm not python expert, but it seems to me youre trying to cycle the regex sub( ) for all 9 digits, PER DIGIT of the input (is it?) – RARE Kpop Manifesto Mar 22 '22 at 22:59
  • oh yea we're not even remotely on the same order of magnitude of inputs. a string solution is definitely only advantageous on very large inputs - your gist seems be testing all lengths from 1 to 120 digits. Those 2 test cases of mine are 507,500,000 or so digits for the smaller test case (half a billion, give a take, as a single integer), and 3.03 billion digits for the larger one. We aren't even on the same page - no wonder we kept talking past each other – RARE Kpop Manifesto Mar 22 '22 at 23:04
  • Ah yep, you're right I messed it up by iterating `str(n)` by digit. I was thrown off by your code iterating `sys.stdin`, by lines, when there is actually only one line to work with! Updated [the gist](https://gist.github.com/wimglenn/d6c0ffb06e0e40e247cd6eacb0d54745), unfortunately the regex solution is still much slower for small numbers, and also slower than plain old string count operations for large numbers. See [my answer](https://stackoverflow.com/a/71566286/674039) for an updated graph. – wim Mar 23 '22 at 02:03
  • 1
    It's hard to understand this as a Python answer because of all the surrounding shell code for timing. Python has built-in timing functionality in the standard library: the `timeit` module allows for timing code programmatically, as well as timing short code snippets at the command line. Also note that you are assuming input as a string, rather than as a Python (arbitrary-size) integer. – Karl Knechtel Oct 09 '22 at 16:02
  • When I try your approach, I get impossibly slow code for a million-digit string input. I assume this is because it needlessly iterates over the string (the outer `sum`). It's not a million times slower; I assume results get cached somehow. However, when I fix the outer loop issue (`sum(int(_)*subn(_, '', test_str)[1] for _ in '123456789')`), the result is **still** slower (about 0.12 seconds per trial) than simply using OP's approach directly on a string (about 0.08 seconds). Considering, however, that simply doing the int->str conversion takes more than 13 seconds.... – Karl Knechtel Oct 09 '22 at 16:18
-2

you can also try this with built_in_function called divmod() ;

number = int(input('enter any integer: = '))
sum = 0
while number!=0: 
    take = divmod(number, 10) 
    dig = take[1] 
    sum += dig 
    number = take[0] 
print(sum) 

you can take any number of digit