9

I'm trying to write a Python script that finds all integers (N) where a certain power of the sum of the digits of N is equal to N. For example, N=81 qualifies, because 8 + 1 = 9, and a certain power of 9 (namely 2) = 81.

The ranges I chose are arbitrary. My script works but it is very, very slow. Ideally, I'd want to find the first 30 such integers in about 6000ms.

My first solution:

def powerOfSum1():
    listOfN = []
    arange = [a for a in range(11, 1000000)] #range of potential Ns
    prange = [a for a in range(2, 6)] # a range for the powers to calculate
    for num in arange:
        sumOfDigits = sum(map(int, str(num)))
        powersOfSum = [sumOfDigits**p for p in prange]
        if num in powersOfSum:
            listOfN.append(num)
    return listOfN

In my second solution, I tried storing all the powers for each sumOfDigits, but that did not improve the performance much.

def powerOfSum2():
    listOfN = []
    powers= {}
    for num in range(11, 1000000):
        sumOfDigits = sum(map(int, str(num)))
        summ = str(sumOfDigits)
        if summ in powers:
            if num in powers[summ]:
                listOfN.append(num)
        else:
            powersOfSum = [sumOfDigits**p for p in range(2,6)]
            powers[summ] = powersOfSum
            if num in powers[summ]:
                listOfN.append(num)
    return listOfN

I haven't studied data structures and algorithms yet, so I'd appreciate any pointers on making this script more efficient.

5u2ie
  • 93
  • 8
  • Please fix the indentation of your code. – thefourtheye Sep 14 '15 at 01:17
  • `arange = range(11, 1000000)` works like `arange = [a for a in range(11, 1000000)] ` – Jose Ricardo Bustos M. Sep 14 '15 at 01:28
  • @5u2ie What version of python are you using? – Anand S Kumar Sep 14 '15 at 01:39
  • Option 2 takes 2.1 seconds on my box (in Py3; Py2 takes about 3s). Isn't that well within your 6000ms limit? – paxdiablo Sep 14 '15 at 01:39
  • Using divmod to sum your digits will be faster, also make sure if using python2 to use xrange – Padraic Cunningham Sep 14 '15 at 01:48
  • @PadraicCunningham: [faster, but still not optimal](http://stackoverflow.com/questions/14939953/sum-the-digits-of-a-number-python) – fjarri Sep 14 '15 at 01:48
  • @paxdiablo -- Her displayed code does 11 results and she said she wanted first 30 within 6s. That's _much_ harder... – Patrick Maupin Sep 14 '15 at 02:35
  • 2
    When calculating integer sequences, you should first go to Sloane's and input the sequence. This is sequence [A023106 "a(n) is a power of the sum of its digits."](https://oeis.org/A023106). The first 32 such numbers up to 68719476736 can be found by clicking on the 'list' link. Often one can find algorithms (which may or may not be efficient) also given, as well as references. – ninjagecko Sep 14 '15 at 03:07
  • [addendum] There is also linked the [first such 1137 numbers, the largest of which could fill a few paragraphs](https://oeis.org/A023106/b023106.txt) – ninjagecko Sep 14 '15 at 04:36

4 Answers4

4

Your solution examines every possible integer to see it could be a solution. It's more efficient to only examine the integers that are actually powers to see if they are valid answers -- because there are fewer of those. Here is something that does this. But it will probably take longer than 6s to find 30 -- they get scarce fast.

EDIT -- updated to do the faster digit-summing suggested in the comments by Padraic Cunningham and fjarri, and then updated to add a couple of more tweaks, make it a generator, and make it Python-3 friendly.

It still gets slow fast, but the algorithm may be parallelizable -- maybe you could put the digit-summing in a separate process.

EDIT 2 -- Shaved off some time by doing a quick check of whether the base and the result value are equal modulo 9. There may be further number theory tricks afoot...

import heapq
import functools


def get_powers():
    heap = []
    push = functools.partial(heapq.heappush, heap)
    pop = functools.partial(heapq.heappop, heap)
    nextbase = 3
    nextbasesquared = nextbase ** 2
    push((2**2, 2, 2))
    while 1:
        value, base, power = pop()
        if base % 9 == value % 9:
            r = 0
            n = value
            while n:
                r, n = r + n % 10, n // 10
            if r == base:
                yield value, base, power
        power += 1
        value *= base
        push((value, base, power))
        if value > nextbasesquared:
            push((nextbasesquared, nextbase, 2))
            nextbase += 1
            nextbasesquared = nextbase ** 2


for i, info in enumerate(get_powers(), 1):
    print(i, info)
Patrick Maupin
  • 8,024
  • 2
  • 23
  • 42
4

This is a great time to break out a profiler and see where your code is spending all its time. To do that, I put a little cProfiler wrapper around your code:

#!/usr/bin/env python

import cProfile

import math


def powerOfSum1():
    listOfN = []
    arange = [a for a in range(11, 1000000)] #range of potential Ns
    prange = [a for a in range(2, 6)] # a range for the powers to calculate
    for num in arange:
        sumOfDigits = sum(map(int, str(num)))
        powersOfSum = [sumOfDigits**p for p in prange]
        if num in powersOfSum:
            listOfN.append(num)
    return listOfN


def main():
    cProfile.run('powerOfSum1()')

if __name__ == "__main__":
    main()

Running this, here's what I got:

⌁ [alex:/tmp] 44s $ python powers.py
         1999993 function calls in 4.089 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    4.089    4.089 <string>:1(<module>)
        1    0.934    0.934    4.084    4.084 powers.py:7(powerOfSum1)
   999989    2.954    0.000    2.954    0.000 {map}
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.017    0.009    0.017    0.009 {range}
   999989    0.178    0.000    0.178    0.000 {sum}

If you look, it seems like most of the time is being spend on that map call, where you turn the num to a string, then make each digit an int, which is then summed.

It makes a lot of sense that this would be the slow part of the program. Not only do you do it a lot, but it's a slow operation: On that one line, you do a string-parsing operation, and then map an int conversion function over each char in the string, and then you sum them up.

I bet that if you can compute the sum of digits without doing string conversion first, it'll be a lot faster.

Let's try that. I made some other changes, like removing the redundant list comprehensions at the start. Here's what I got:

#!/usr/bin/env python

#started at 47.56

import cProfile

import math

MAXNUM = 10000000

powersOf10 = [10 ** n for n in range(0, int(math.log10(MAXNUM)))]

def powerOfSum1():
    listOfN = []
    arange = range(11, MAXNUM) #range of potential Ns
    prange = range(2, 6) # a range for the powers to calculate
    for num in arange:
        sumOfDigits = 0
        for p in powersOf10:
            sumOfDigits += num / p % 10
        powersOfSum = []
        curr = sumOfDigits
        for p in prange:
            curr = curr * sumOfDigits
            if num < curr:
                break
            if num == curr:
                listOfN.append(num)
    return listOfN

def main():
    cProfile.run('powerOfSum1()')

if __name__ == "__main__":
    main()

What does cProfile have to say?

⌁ [alex:/tmp] 3m42s $ python powers.py
         15 function calls in 0.959 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.959    0.959 <string>:1(<module>)
        1    0.936    0.936    0.953    0.953 powers.py:13(powerOfSum1)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.017    0.009    0.017    0.009 {range}

4 seconds to 0.9 seconds? Much better.

If you want to really see the effect, add an extra zero to the upper bound of arange. On my box, the original code takes ~47 seconds. The modified code takes ~10.

The profiler is your friend, and string conversion isn't free when you do it hundreds of thousands of times :)

Alex
  • 4,316
  • 2
  • 24
  • 28
  • Knowledge of the profiler is imperative, and this is a good tutorial. OTOH, it's also perhaps a bit premature. I haven't run a profiler yet, and on my version on my machine, the same first 11 numbers show up in under 1/100 second... – Patrick Maupin Sep 14 '15 at 02:19
  • `break`ing after the `append()` would avoid one more go around. – AChampion Sep 14 '15 at 02:31
3

Update: I've discovered this is a Project Euler problem (#119) and I've found basically the same solution already documented: http://www.mathblog.dk/project-euler-119-sum-of-digits-raised-power/

I'm not sure if I am over simplifying but just iterating over the powers for a range of numbers seems pretty quick. You can't guarantee the order so calculate more than you need and then sort and take the top 30. I can't prove I've got them all but I've tested base up to 500 and exp up to 50 and it returns the same top 30. It should be noted the OP only tested exponents up to 5, which limits the number of results significantly:

def powerOfSum():
    listOfN = []
    for base in range(2, 100):
        num = base
        for _ in range(2, 10):
            num *= base
            if sum(map(int, str(num))) == base:
                listOfN.append(num)
    return sorted(listOfN)[:30]
powerOfSum()

Output

[81,
 512,
 2401,
 4913,
 5832,
 17576,
 19683,
 234256,
 390625,
 614656,
 1679616,
 17210368,
 34012224,
 52521875,
 60466176,
 205962976,
 612220032,
 8303765625,
 10460353203,
 24794911296,
 27512614111,
 52523350144,
 68719476736,
 271818611107,
 1174711139837,
 2207984167552,
 6722988818432,
 20047612231936,
 72301961339136,
 248155780267521]

Running timeit on it (including the sort) I get:

100 loops, best of 3: 1.37 ms per loop
AChampion
  • 29,683
  • 4
  • 59
  • 75
  • If there is no requirement to generate them in order or know up front if you have the right constants to iterate to, this is plenty quick. – Patrick Maupin Sep 14 '15 at 04:28
  • Yes, fast if you know the constants but even with 500, 50 it only takes 214ms and finds 197 results. I really liked your heapsort for in order powers but it slowed down too fast (mid 20s). Seems odd, that it is that much slower compared to just dumb looping. – AChampion Sep 14 '15 at 04:43
  • @achampion such an elegant solution, however, when I execute your code (even with constants 500, 50) I'm missing the 11th and the 12th element (17210368 and 34012224) in the final sorted list, which makes me wonder. Also, I only get 149 results, any ideas why this is happening? – 5u2ie Sep 14 '15 at 10:42
  • Oops, I tried to optimise with a break, fixed it... should work again. They are 2nd powers on the same base. – AChampion Sep 14 '15 at 13:03
-1

[edit: Due to a bug in the given algorithm I was transcribing, this method is fairly slow (about as fast as the other methods). I'm keeping this here for code reference. However this seems to be the best that can be done without resorting to number theory tricks.]

When calculating integer sequences, you should first go to Sloane's and input the sequence. This is sequence A023106 "a(n) is a power of the sum of its digits.". The first 32 such numbers up to 68719476736 can be found by clicking on the 'list' link. Often one can find algorithms (which may or may not be efficient) also given, as well as references. There is also linked the first such 1137 numbers up to [some number large enough to fill up a few paragraphs]

Now as for an efficient algorithm: unless you have some way to skip ranges of numbers without looking at them, or unless we can exploit some mathematical property of the numbers, you are stuck with an O(N) algorithm. Another way you could approach it is try calculating all powers (which lets you skip all the numbers), and then testing each power P=n^m to see "is there a number whose digits sum to some power of P (or whatever)".

In fact, this algorithm is already provided to us in the above link. The algorithm given in the above link is (in Mathematica):

fQ[n_] := Block[
  {b = Plus @@ IntegerDigits[n]}, 
  If[b > 1, IntegerQ[ Log[b, n]] ]
]; 
Take[
  Select[ 
    Union[ Flatten[ 
      Table[n^m, {n, 55}, {m, 9}]
    ]], fQ[ # ] &], 
  31
]
(* Robert G. Wilson v, Jan 28 2005 *)

The loose translation is:

def test(n):
     digitSum = sum of digits of n
     return n is a power of digitSum
candidates = set(n^m for n in range(55) for m in range(9))
matches = [c for c in candidates if test(c)]

A full implementation would be:

from math import *  # because math should never be in a module

def digitSum(n):
    return sum(int(x) for x in str(n))

def isPowerOf(a,b):
    # using log WILL FAIL due to floating-point errors
    # e.g. log_3{5832} = 3.0000..04
    if b<=1:
        return False
    # using http://stackoverflow.com/a/4429063/711085
    while a%b==0:
        a = a / b
    return a==1

def test(n):
    return isPowerOf(n, digitSum(n))

M = 723019613391360  # max number to check
candidates = set(n**m for n in xrange(int(sqrt(M)+1)) 
                       for m in xrange(int(log(M,max(n,2)))+1))
result = list(sorted([c for c in candidates if test(c)]))

output:

>>> result
[2, 3, 4, 5, 6, 7, 8, 9, 81, 512, 2401, 4913, 5832, 17576, 19683, 234256, 390625, 614656, 1679616, 17210368, 34012224, 52521875, 60466176, 205962976, 612220032, 8303765625, 10460353203, 24794911296, 27512614111, 52523350144, 68719476736, 271818611107, 1174711139837, 2207984167552, 6722988818432, 20047612231936, 72301961339136, 248155780267521]

Unfortunately this takes a considerable amount of time. For example above, we have to check 53,863,062 candidates, which could take minutes.

ninjagecko
  • 88,546
  • 24
  • 137
  • 145
  • Ah yes, I was about to post an edit, since I simultaneously discovered this bug in the transcribed code (the fix was to check all candidates, `candidates = set(...)` replaced the incorrect `in range(55)... range(9)`). Keeping this answer here for reference by others. – ninjagecko Sep 14 '15 at 04:40