6

I am currently solving a mathematical problem in which I need to count the number of reduced proper fractions with both numerator and denominator more more than 1000000 (10^6).

I have code that works fine for small numbers; the example given (value=8) gives the correct (given) answer 21.

But this code seems to be very slow for big numbers for reasons I don't know. I read a whole bunch of similar questions on SO, but couldn't find anything that was useful. I had a closer look at this and this, but that didn't really help me out. My code works with acceptable speed for values until 1000, then it gets super-super-slow.

import math

def is_prime(n):
    if n == 2:
        return True
    if n % 2 == 0 or n <= 1:
        return False
    sqr = int(math.sqrt(n)) + 1
    for divisor in range(3, sqr, 2):
        if n % divisor == 0:
            return False
    return True

def get_divisors(n):
    liste = [1]
    if n % 2 == 0:
        liste.append(2)
    for divisor in range(3, n+1):
        if n % divisor == 0:
            liste.append(divisor)
    return liste

def intersect(a, b):
    return list(set(a) & set(b))

until = 1000

primes = list()
for i in range(int(until)):
    if i != 1 and i != 0:
        if is_prime(i):
            primes.append(i)

pos = 0
for i in range(1, until+1):
    if i%50 == 0:
        print(i)
    if is_prime(i):
        pos += (i-1)
    else:
        di = get_divisors(i)
        di.remove(1)
        for j in range(1, i):
            dj = get_divisors(j)
            if intersect(di, dj)==[]:
                pos+=1   


print(pos)

I want to know which parts of my program are reducing the speed and how to fix these issues.

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
monamona
  • 1,195
  • 2
  • 17
  • 43
  • Just wanted to add that I made SURE it worked by testing out some other numbers where I figured out the correct result the old fashioned way (pen and paper) – monamona Jan 23 '18 at 20:13
  • 2
    this is a classic. 1) use a sieve to generate a lot of primes, 2) when computing divisors, loop to sqrt(n): if you find a low divisor, you've found the high one by dividing (unless it's the same because it's a perfect square). http://codereview.stackexchange.com contains a lot of examples of this – Jean-François Fabre Jan 23 '18 at 20:15
  • The Primes are not the problem, I need 2.76 seconds to generate the primes until 1000k – monamona Jan 23 '18 at 20:15
  • You seem to be pre-computing a list of `primes`, but then your main loop doesn't take advantage of this; it still calls `is_prime()`. So, your inefficient primality test is being inefficiently called many times. – TypeIA Jan 23 '18 at 20:18
  • @TypeIA I only use it beacuse it is faster in total! I know exactly that when the denominator x is a prime the number of new unique fractions is x-1, so I add it intantly instead of going through another for-loop – monamona Jan 23 '18 at 20:37
  • @monamona I get that, but why are you _precomputing_ a list of primes without using that list at all? I think you meant to check `primes` instead of call `is_prime()` in your main loop, right? As it stands now your precomputation is simply dead code. – TypeIA Jan 23 '18 at 20:44
  • 2
    Please see my answer for an O(n) algorithm. – גלעד ברקן Jan 25 '18 at 11:59
  • "vaguely" resembles this problem https://www.codewars.com/kata/55b7bb74a0256d4467000070/train/python – NomadMonad Jun 04 '21 at 11:17

6 Answers6

4

If the primes themselves are fast enough (I'd still recommend an Eratosthenes sieve which is better than your okay prime testing to mass-generate primes), the generation of the divisors isn't

for divisor in range(3, n+1):
    if n % divisor == 0:
        liste.append(divisor)

this loop has O(n) complexity when it could have O(n**0.5) complexity like this:

liste = set()  # okay, the var name isn't optimal now :)
for divisor in range(3, int((n+1)**0.5)+1):
    if n % divisor == 0:
        other = n // divisor
        liste.add(divisor)
        liste.add(other)

When you found a "low" (<=sqrt(n)) divisor, other is just the complementary "high" divisor. This allows the loop to be much shorter.

since you're going to perform intersection by converting to a set, why not creating a set in the first place? The advantage is that in a perfect square, you're not adding the same value twice (you could test, but not sure it would be faster, since collisions are rare)

now everything is set so:

if intersect(di, dj)==[]:
            pos+=1   

becomes:

if not di.isdisjoint(dj):
   pos += 1
  • no set creation at each iteration
  • no set creation to know if the sets have common values. Just use set.isdisjoint

Last thing: As noted in comments, when testing for primality in the main loop, you're calling is_prime() again, instead of storing your generated primes in a set and test with in instead.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
2

For starters, quit doing a prime factorization. Instead, use Euclid's algorithm to check whether the numerator and denominator have a GCD of 1.

If you do find a need for primes, search for faster generating algorithms, such as the Sieve of Eratosthenes.

Finally, think on this: generate the primes (efficiently); factor each denominator, as you're doing now. However, instead of looping through all possible numerators, how can you generate the valid numerators for your needs?

Prune
  • 76,765
  • 14
  • 60
  • 81
  • And then there is [`math.gcd()`](https://docs.python.org/3.5/library/math.html#math.gcd) – Mr. T Jan 23 '18 at 20:24
1

A few simple optimizations are possible. You could try these and see whether performance improves by enough to work for your use case.

First, the way in which you are finding the primes up to until could be improved. Rather than checking each number in the range independently against all numbers up to its square root, you could remember primes you've already found so far and only check those up to the square root of the new candidate prime. If a number is non-prime, it is divisible by some prime less than or equal to its square root. Since there are a lot fewer primes less than n than there are numbers less than n as n gets bigger, you can avoid lots of divisor checking by reusing the partial results you have already constructed at each step. This idea is like doing the Sieve or Eratosthenes without writing down all the numbers and crossing them out, but rather just writing down the ones that turn out to be prime as you go.

Second, when you are checking whether the fractions are in lowest terms, you don't need to look for any common divisors; checking for just for prime common divisors will be sufficient. Why? if the fraction is not in lowest terms, there is a common denominator which will be divisible by some prime number. Therefore, we can dispense with getting all factors, and focus on just getting the prime factorization. This can be sped up by only checking the prime numbers we got in the first step as candidate factors and, when we find a match, dividing at each step by the factor we've found. Thus, we can avoid lots of checks here as well. Something like this:

p = 1
while number > 1 do
    if number % primes[p] == 0 then
        print "Another prime factor is " + primes[p]
        number = number / primes[p]
    else then p = p + 1

The second of these optimizations will probably give you the most bang for your buck.

Patrick87
  • 27,682
  • 3
  • 38
  • 73
1

Generate the prime factorizations of every possible denominator. From each prime factorization, calculate the totient of the denominator (https://en.wikipedia.org/wiki/Euler%27s_totient_function). That is the number of proper fractions with that denominator. Add up all of those and you're done.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Please see my answer for an O(n) application of the totient. – גלעד ברקן Jan 25 '18 at 12:03
  • @גלעדברקן Your algorithm is hard to follow. It looks like you're shooting for this one: http://www.mathblog.dk/project-euler-72-reduced-proper-fractions/ , which is beautiful and fast, but still not O(n). There is an O(n) way that involves iterating through prime factorizations, but it's not nearly so pretty. It's fun, though, so I may add it above – Matt Timmermans Jan 25 '18 at 13:08
  • I am not familiar with link you posted. Of course this algorithm is `O(n)`. Every number is only generated once and the pointer to the next prime is incremented a maximum of `n` times. (Please see actual implementations for a problem using a similar method I proposed here https://stackoverflow.com/questions/47229270/count-of-divisors-of-numbers-till-n-in-on/) – גלעד ברקן Jan 25 '18 at 13:32
  • @גלעדברקן Sieve of Euler is not O(n) – Pham Trung Jan 25 '18 at 15:21
1

This problem needs to be looked from a different angle. Let's look at the problem.

For a number n = a^k1 * b ^ k2 * c ^ k3 with a, b, c ... is prime number, the answer for the problem is n - all number that is less than n and divides by either a or b or c ....

So, assuming that we know which are a, b, c ..., how to calculate all number that is less than n and divides by a or b or c ?? Let take a look at inclusion-exclusion theorem, and this question Number of positive integers in [1,1e18] that cannot be divided by any integers in [2,10]. Now, we can have a reasonable algo to handle that with time complexity is O(2^m) with m is the number of prime number. With the assumption that n < 10^6, we can see that m < 8 (very raw estimation: 2*3*5*7*11*13*17*23 > 10^6)

Finally, now to get the list of prime number for each number, we can use Sieve of Eratosthenes

Pseudo code:

max = 1000001
// First, get all the prime number factors for each number

list_prime = int[max][]
for i in range (2, max):
    if list_prime[i] = []:
       list_prime[i].append(i)
       for j = i; j < max; j += i
           list_prime[j].append(i)

// Second, calculate the number of value less than i and not share any factor with i

result = 0
for i in range (2, max - 1):
    number_of_prime_factors = length(list_prime[i])
    number_of_value = i
    for j = 1; j < 1 << number_of_prime_factors; j++:
        value = 0;
        for k in range (0, number_of_prime_factors):
            if ((1 << k) & j) != 0 :
                value += i / list_prime[i][k]
        if number of bit in j % 2 == 0:
            number_of_value -= value
        else:
            number_of_value += value
        result +=  number_of_value

return result

Time complexity should be O(n * 2 ^ m * m) with m < 8 and n <= 10^6

Note: With the Totient function, the time complexity should be further decreased to O (n * m), thanks @Matt Timmermans for his answer

Pham Trung
  • 11,204
  • 2
  • 24
  • 43
  • When I'm bcak homwI will definitely try this out!!! I'm not sure to who I want award the accept, since all suggestions are soiunding very interesting and seem to be helpful! – monamona Jan 24 '18 at 07:15
  • 1
    @monamona in my humble opinion, the key things to solve this problem are to making use of Eratosthenes, the realisation that the number of prime factors is less than 8 for each number and the use of Totient function, so I think either Matt or me has pointed them out, so it should be your choice (to be fair, Matt is the fastest to point out the correct solution, and my answer is a clearer explanation of his answer) – Pham Trung Jan 24 '18 at 07:20
  • Please see my answer for an O(n) algorithm. – גלעד ברקן Jan 25 '18 at 12:00
1

We do not need prime factorizations or inclusion-exclusion to solve this. We can modify Euler's sieve to dynamically compute Euler's totient using his product formula, leading to an O(n) algorithm.

For each number on our accumulating list, we also save its totient. We take advantage of the fact that for tuple (n, phi(n)) where phi is Euler's totient function:

  if m = p * n, for prime p:
    if p does not divide n
      then phi(m) = (p - 1) * phi(n)
      else phi(m) = p * phi(n)

(Note that primes are generated as part of this algorithm by simply incrementing a pointer to the next number not already on the list.)

For example n = 12:

list = [(2,1)]
total = 1
prime: 2

2*2: (2 * 2, 2 * 1)
total = total + 2
list: [(2,1), (4,2)]

2*2*2: (2 * 4, 2 * 2)
total = total + 4
list: [(2,1), (4,2), (8,4)]


prime: 3
total = total + 2
list: [(2,1), (3,2), (4,2), (8,4)]

3*3: (3 * 3, 3 * 2)
total = total + 6
list: [(2,1), (3,2), (4,2), (8,4), (9,6)]

3*2: (3 * 2, 1 * (3-1))
total = total + 2
list: [(2,1), (3,2), (4,2), (6,2), (8,4), (9,6)]

3*4: (3 * 4, 2 * (3-1))
total = total + 4
list: [(2,1), (3,2), (4,2), (6,2), (8,4), (9,6), (12,4)]


prime: 5
total = total + 4
list: [(2,1), (3,2), (4,2), (5,4), (6,2), (8,4), (9,6), (12,4)]

5*2: (5 * 2, 1 * (5-1))
total = total + 4
list: [(2,1), (3,2), (4,2), (5,4), (6,2), (8,4), (9,6), (10,4), (12,4)]


prime: 7
total = total + 6
list: [(2,1), (3,2), (4,2), (5,4), (6,2), (7,6), (8,4), (9,6), (10,4), (12,4)]


prime: 11
total = total + 10
list: [(2,1), (3,2), (4,2), (5,4), (6,2), (7,6), (8,4), (9,6), (10,4), (11,10), (12,4)]

Python code (this code is actually an adaptation of code by btilly, optimizing an idea of mine):

n = 8

total = 0
a = [None for i in range(0, n+1)]
s = []
p = 1
while (p < n):
  p = p + 1

  if a[p] is None:
    print("\n\nPrime: " + str(p))
    a[p] = p - 1
    total = total + a[p]
    s.append(p)

    limit = n / p
    new_s = []

    for i in s:
      j = i
      while j <= limit:
        new_s.append(j)
        print j*p, a[j]
        a[j * p] = a[j] * p if (not j % p) else a[j] * (p-1)
        total = total + a[j * p]
        j = j * p
    s = new_s

print("\n\nAnswer: " + str(total) + " => " + str([(k,d) for k,d in enumerate(a)]))
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61