How would I go about improving/making this run faster?

Question

I'm a beginner in Python trying to get better, and I stumbled across the following exercise:

Let n be an integer greater than 1 and s(n) the sum of the dividors of n. For example,
s(12) 1 + 2 + 3 + 4 + 6 + 12 = 28
Also,
s(s(12)) = s(28) = 1 + 2 + 4 + 7 + 14 + 28 = 56
And
s(s(s(12))) = s(56) = 1 + 2 + 4 + 7 + 8 + 14 + 28 + 56 = 120
We use the notations:
s^1(n) = s(n)
s^2(n) = s(s(n))
s^3(n) = s(s(s(n)))
s^ m (n) = s(s(. . .s(n) . . .)), m times
For the integers n for which exists a positive integer k so that
 s^m(n) = k * n
are called (m, k)-perfect, for instance 12 is (3, 10)-perfect since s^3(12) = s(s(s(12))) = 120 = 10 * 12

Special categories:

For m =1 we have multiperfect numbers

A special case of the above exist for m = 1 and k = 2 which are called perfect numbers.

For m = 2 and k = 2 we have superperfect numbers.

Write a program which finds and prints all (m, k)-perfect numbers for m <= MAXM, which are less or equal to (<=) MAXNUM. If an integer belongs to one of the special categories mentioned above the program should print a relevant message. Also, the program has to print how many different (m, k)-perfect numbers were found, what percentage of the tested numbers they were, in how many occurrences for the different pairs of (m, k), and how many from each special category were found(perfect numbers are counted as multiperfect as well).

Here's my code:

import time
start_time = time.time()
def s(n):
    tsum = 0
    i = 1
    con = n
    while i < con:
        if n % i == 0:
            temp = n / i
            tsum += i
            if temp != i:
                tsum += temp 
            con = temp
        i += 1                    
    return tsum
#MAXM
#MAXNUM

i = 2

perc = 0
perc1 = 0
perf = 0
multperf = 0
supperf = 0
while i <= MAXNUM:
    pert = perc1
    num = i
    for m in xrange(1, MAXM + 1):        
        tsum = s(num)                
        if tsum % i == 0:            
            perc1 += 1
            k = tsum / i            
            mes = "%d is a (%d-%d)-perfect number" % (i, m, k)
            if m == 1:
                multperf += 1
                if k == 2:
                    perf += 1 
                    print mes + ", that is a perfect number"
                else:
                    print mes + ", that is a multiperfect number"               
            elif m == 2 and k == 2:
                supperf += 1
                print mes + ", that is a superperfect number"
            else:
                print mes        
        num = tsum        
    i += 1
    if pert != perc1: perc += 1
print "Found %d distinct (m-k)-perfect numbers (%.5f per cent of %d ) in %d occurrences" % (
perc, float(perc) / MAXNUM * 100, MAXNUM, perc1)
print "Found %d perfect numbers" % perf
print "Found %d multiperfect numbers (including perfect numbers)" % multperf
print "Found %d superperfect numbers" % supperf  
print time.time() - start_time, "seconds"

It works fine, but I would like suggestions on how to make it run faster. For instance is it faster to use

I = 1
while I <= MAXM:
    …..
    I += 1

instead of

for I in xrange(1, MAXM + 1)

Would it be better if instead of defining s(n) as a function I put the code into the main program? etc. If you have anything to suggest for me to read on how to make a program run faster, I'd appreciate it. And one more thing, originally the exercise required the program to be in C (which I don't know), having written this in Python, how difficult would it be for it to be made into C?

Is this _really_ slow? This is only an advice, but as a beginner, do you really have to find a way to run your code "faster"? You should first be concerned by having a more readable and maintainable code. Well in fact, this is an advice I used to give to (much) more experienced programmers too... — Sylvain Leroux, Jun 24 '13 at 15:58
if you knew C, it would be very easy to convert... most if it is already C code, you just need curly braces and semicolons :) — Karoly Horvath, Jun 24 '13 at 16:06
+ For performance, sometimes beginners focus on stuff that doesn't matter much. [*Here is the method*](http://stackoverflow.com/a/4299378/23771) I use. It will tell you which pieces of code are worth paying attention to. — Mike Dunlavey, Jun 24 '13 at 16:10
I think the biggest key to better performance would be minimizing division/modulo operations. A different approach than what you've taken here would be finding the prime-factorization of a number, then recombining the factors to compute the sum. Remembering factorizations, and noting that the factors of N=p*x (where p is a prime) are are the combined list of the factors of x and each of those factors multiplied by p would be good ideas (e.g. factors of 6 are 1, 2, 3; factors of 12=2*6 are 1, 2, 3 and 2, 4, 6). — twalberg, Jun 24 '13 at 16:30
All of the answers I see are not starting with a round of diagnosis. They are like throwing darts blindfolded. — Mike Dunlavey, Jun 24 '13 at 19:48
I see a lot of good arguments here, an interesting work to optimize that function using mathematical analysis and several promising suggestions. But the original question was about a "beginner in Python trying to get better". "Better" definitively. "in Python" not sure. — Sylvain Leroux, Jun 24 '13 at 21:04
@Sylvain Is it supposed to be "at Python" instead of "in Python"? Or maybe I'm missing the point? — user2516856, Jun 25 '13 at 01:03

score 9 · Accepted Answer · edited May 23 '17 at 12:16

The biggest improvements come from using a better algorithm. Things like

Would it be better if instead of defining s(n) as a function I put the code into the main program?

or whether to use a while loop instead of for i in xrange(1, MAXM + 1): don't make much difference, so should not be considered before one has reached a state where algorithmic improvements are at least very hard to come by.

So let's take a look at your algorithm and how we can drastically improve it without caring about minuscule things like whether a while loop or a for iteration are faster.

def s(n):
    tsum = 0
    i = 1
    con = n
    while i < con:
        if n % i == 0:
            temp = n / i
            tsum += i
            if temp != i:
                tsum += temp 
            con = temp
        i += 1                    
    return tsum

That already contains a good idea, you know that the divisors of n come in pairs and add both divisors once you found the smaller of the pair. You even correctly handle squares.

It works very well for numbers like 120: when you find the divisor 2, you set the stop condition to 60, when you find 3, to 40, ..., when you find 8, you set it to 15, when you find 10, you set it to 12, and then you have only the division by 11, and stop when i is incremented to 12. Not bad.

But it doesn't work so well when n is a prime, then con will never be set to a value smaller than n, and you need to iterate all the way to n before you found the divisor sum. It's also bad for numbers of the form n = 2*p with a prime p, then you loop to n/2, or n = 3*p (n/3, unless p = 2) etc.

By the prime number theorem, the number of primes not exceeding x is asymptotically x/log x (where log is the natural logarithm), and you have a lower bound of

Ω(MAXNUM² / log MAXNUM)

just for computing the divisor sums of the primes. That's not good.

Since you already consider the divisors of n in pairs d and n/d, note that the smaller of the two (ignoring the case d = n/d when n is a square for the moment) is smaller than the square root of n, so once the test divisor has reached the square root, you know that you have found and added all divisors, and you're done. Any further looping is futile wasted work.

So let us consider

def s(n):
    tsum = 0
    root = int(n**0.5)  # floor of the square root of n, at least for small enough n
    i = 1
    while i < root + 1:
        if n % i == 0:
            tsum += i + n/i
        i += 1
    # check whether n is a square, if it is, we have added root twice
    if root*root == n:
        tsum -= root
    return tsum

as a first improvement. Then you always loop to the square root, and computing s(n) for 1 <= n <= MAXNUM is Θ(MAXNUM^1.5). That's already quite an improvement. (Of course, you have to compute the iterated divisor sums, and s(n) can be larger than MAXNUM for some n <= MAXNUM, so you can't infer a complexity bound of O(MAXM * MAXNUM^1.5) for the total algorithm from that. But s(n) cannot be very much larger, so the complexity can't be much worse either.)

But we can still improve on that by using what twalberg suggested, using the prime factorisation of n to compute the divisor sum.

First, if n = p^k is a prime power, the divisors of n are 1, p, p², ..., p^k, and the divisor sum is easily computed (a closed formula for the geometric sum is

(p^(k+1) - 1) / (p - 1)

but whether one uses that or adds the k+1 powers of p dividing n is not important).

Next, if n = p^k * m with a prime p and an m such that p does not divide m, then

s(n) = s(p^k) * s(m)

An easy way to see that decomposition is to write each divisor d of n in the form d = p^a * g where p does not divide g. Then p^a must divide p^k, i.e. a <= k, and g must divide m. Conversely, for every 0 <= a <= k and every g dividing m, p^a * g is a divisor of n. So we can lay out the divisors of n (where 1 = g_1 < g_2 < ... < g_r = m are the divisors of m)

 1*g_1   1*g_2  ...  1*g_r
 p*g_1   p*g_2  ...  p*g_r
  :        :           :
p^k*g_1 p^k*g_2 ... p^k*g_r

and the sum of each row is p^a * s(m).

If we have a list of primes handy, we can then write

def s(n):
    tsum = 1
    for p in primes:
        d = 1
        # divide out all factors p of n
        while n % p == 0:
            n = n//p
            d = p*d + 1
        tsum *= d
        if p*p > n: # n = 1, or n is prime
            break
    if n > 1:       # one last prime factor to account for
        tsum *= 1 + n
    return tsum

The trial division goes to the second largest prime factor of n [if n is composite] or the square root of the largest prime factor of n, whichever is larger. It has a worst-case bound for the largest divisor tried of n**0.5, which is reached for primes, but for most composites, the division stops much earlier.

If we don't have a list of primes handy, we can replace the line for p in primes: with for p in xrange(2, n): [the upper limit is not important, since it is never reached if it is larger than n**0.5] and get a not too much slower factorisation. (But it can easily be sped up a lot by avoiding even trial divisors larger than 2, that is using a list [2] + [3,5,7...] - best as a generator - for the divisors, even more by also skipping multiples of 3 (except 3), [2,3] + [5,7, 11,13, 17,19, ...] and if you want of a few further small primes.)

Now, that helped, but computing the divisor sums for all n <= MAXNUM still takes Ω(MAXNUM^1.5 / log MAXNUM) time (I haven't analysed, that could be also an upper bound, or the MAXNUM^1.5 could still be a lower bound, anyway, a logarithmic factor rarely makes much of a difference [beyond a constant factor]).

And you compute a lot of divisor sums more than once (in your example, you compute s(56) when investigating 12, again when investigating 28, again when investigating 56). To alleviate the impact of that, memoizing s(n) would be a good idea. Then you need to compute each s(n) only once.

And now we have already traded space for time, so we can use a better algorithm to compute the divisor sums for all 1 <= n <= MAXNUM in one go, with a better time complexity (and also smaller constant factors). Instead of trying out each small enough (prime) number whether it divides n, we can directly mark only multiples, thus avoiding all divisions that leave a remainder - which is the vast majority.

The easy method to do that is

def divisorSums(n):
    dsums = [0] + [1]*n
    for k in xrange(2, n+1):
        for m in xrange(k, n+1, k):
            dsums[m] += k
    return dsums

with an O(n * log n) time complexity. You can do it a bit better (O(n * log log n) complexity) by using the prime factorisation, but that method is somewhat more complicated, I'm not adding it now, maybe later.

Then you can use the list of all divisor sums to look up s(n) if n <= MAXNUM, and the above implementation of s(n) to compute the divisor sum for values larger than MAXNUM [or you may want to memoize the values up to a larger limit].

dsums = divisorSums(MAXNUM)
def memo_s(n):
    if n <= MAXNUM:
        return dsums[n]
    return s(n)

That's not too shabby,

Found 414 distinct (m-k)-perfect numbers (0.10350 per cent of 400000 ) in 496 occurrences
Found 4 perfect numbers
Found 8 multiperfect numbers (including perfect numbers)
Found 7 superperfect numbers
12.709428072 seconds

for

import time
start_time = time.time()


def s(n):
    tsum = 1
    for p in xrange(2,n):
        d = 1
        # divide out all factors p of n
        while n % p == 0:
            n = n//p
            d = p*d + 1
        tsum *= d
        if p*p > n: # n = 1, or n is prime
            break
    if n > 1:       # one last prime factor to account for
        tsum *= 1 + n
    return tsum
def divisorSums(n):
    dsums = [0] + [1]*n
    for k in xrange(2, n+1):
        for m in xrange(k, n+1, k):
            dsums[m] += k
    return dsums

MAXM = 6
MAXNUM = 400000

dsums = divisorSums(MAXNUM)
def memo_s(n):
    if n <= MAXNUM:
        return dsums[n]
    return s(n)

i = 2

perc = 0
perc1 = 0
perf = 0
multperf = 0
supperf = 0
while i <= MAXNUM:
    pert = perc1
    num = i
    for m in xrange(1, MAXM + 1):
        tsum = memo_s(num)
        if tsum % i == 0:
            perc1 += 1
            k = tsum / i
            mes = "%d is a (%d-%d)-perfect number" % (i, m, k)
            if m == 1:
                multperf += 1
                if k == 2:
                    perf += 1
                    print mes + ", that is a perfect number"
                else:
                    print mes + ", that is a multiperfect number"
            elif m == 2 and k == 2:
                supperf += 1
                print mes + ", that is a superperfect number"
            else:
                print mes
        num = tsum
    i += 1
    if pert != perc1: perc += 1
print "Found %d distinct (m-k)-perfect numbers (%.5f per cent of %d ) in %d occurrences" % (
perc, float(perc) / MAXNUM * 100, MAXNUM, perc1)
print "Found %d perfect numbers" % perf
print "Found %d multiperfect numbers (including perfect numbers)" % multperf
print "Found %d superperfect numbers" % supperf
print time.time() - start_time, "seconds"

By memoizing also the needed divisor sums for n > MAXNUM:

d = {}
for i in xrange(1, MAXNUM+1):
    d[i] = dsums[i]
def memo_s(n):
    if n in d:
        return d[n]
    else:
        t = s(n)
        d[n] = t
        return t

the time drops to

3.33684396744 seconds

Mind = Blown Thanks a lot. – user2516856 Jun 25 '13 at 01:04 — user2516856, Jun 25 '13 at 01:04

score 1 · Answer 2 · edited May 23 '17 at 12:08

1

from functools import lru_cache

...

@lru_cache
def s(n):
    ...

should make it significantly faster.

[update] oh, sorry, that was added in 3.2 according to the docs. but any cache will do. see Is there a decorator to simply cache function return values?

edited May 23 '17 at 12:08

Community

1
1

answered Jun 24 '13 at 18:32

andrew cooke

45,717
10
93
143

lru_cache cannot be imported, unfortunately, I'm guessing because I'm using Python 2.7. Thanks for the suggestion anyway. – user2516856 Jun 24 '13 at 19:10

How would I go about improving/making this run faster?

2 Answers2