4

I'm trying to solve a problem in TalentBuddy using Python

The problem is :

Given a number N. Print to the standard output the total number of subsets that can be formed using the {1,2..N} set, but making sure that none of the subsets contain any two consecutive integers. The final count might be very large, this is why you must print the result modulo 524287.

I've worked the code. All of the tests are OK, except Test 6. I got OverFlowError when the test is submitting 10000000 as the argument of my function. I don't know what should I do to resolve this error

My code :

import math
def count_subsets(n):
    step1 = (1 / math.sqrt(5)) * (((1 + math.sqrt(5)) / 2) ** (n + 2))
    step2 = (1 / math.sqrt(5)) * (((1 - math.sqrt(5)) / 2) ** (n + 2))
    res = step1 - step2
    print int(res) % 524287

I guess this is taking up memory a lot. I wrote this after I found a mathematical formula to the same topic on the Internet. mathematical formula
I guess my code isn't Pythonic at all.

How to do this, the "Pythonic" way? How to resolve the OverFlowError?

EDIT: In the problem, I've given the example input 3, and the result (output) is 5.

Explanation: The 5 sets are, {}, {1}, {2}, {3}, {1,3}.

However, in Test 6, the problem I've given are:

Summary for test #6

Input test:

[10000000]

Expected output:

165366

Your output:

Traceback (most recent call last):
  On line 4, in function count_subsets:
    step1 = (1 / math.sqrt(5)) * (((1 + math.sqrt(5)) / 2) ** (n + 2))
OverflowError: 
possibility0
  • 121
  • 9
  • Split you step1 and step2 to mini steps, i tried it and i had no errors, the result was 0 so maybe there is some other problem. also try to use [Decimal](http://docs.python.org/2/library/decimal.html#module-decimal) – Kobi K Oct 15 '13 at 14:45
  • 1
    How precise do you need the solution. What output do you expect from count_subsets() ? – OldTinfoil Oct 15 '13 at 16:33
  • @IntrepidBrit I edited out the question, written the example input/output, and the error test – possibility0 Oct 15 '13 at 23:23
  • Next question: Why do you take the integer & modulus? Python is more than capable of handling those large numbers. My local test code is happily chomping though those numbers with ridiculous precision. – OldTinfoil Oct 16 '13 at 13:00
  • @IntrepidBrit First of all, the exact words from the problem : "The final count might be very large, this is why you must print the result modulo 524287" Second, `res` will contain floating number (ex. `5.0`), and the example output is an integer. So in the end I converted `res` into an integer – possibility0 Oct 16 '13 at 13:41
  • Well, although the value is very large, you don't need to take the modulo when using Decimals. In fact, by casting to int can be a lengthy process if you have sufficient precision. Give me a few minutes - I'll throw an answer together. – OldTinfoil Oct 16 '13 at 14:12
  • @RyoArmanda - Could you link us to where you found the mathematical simplification? – OldTinfoil Oct 17 '13 at 15:22
  • @IntrepidBrit I found the mathematical expression through [here](http://www.csie.ntu.edu.tw/~lyuu/dm/2013/20130516.pdf‎/). I cannot confirm this is true or not. Maybe is there another statemnet? – possibility0 Oct 18 '13 at 13:52
  • @RyoArmanda - what version of Python does TalentBuddy use? – OldTinfoil Oct 18 '13 at 16:10

1 Answers1

5

Let f(N) be the number of subsets that contain no consecutive numbers. There's F(N-2) subsets that contain N, and F(N-1) subsets that don't contain N. This gives:

F(N) = F(N-1) + F(N-2).

F(0) = 1 (there's 1 subset of {}, namely {}).
F(1) = 2 (there's 2 subsets of {1}, namely {} and {1}).

This is the fibonacci sequence, albeit with non-standard starting conditions.

There is, as you've found, a formula using the golden ratio to calculate this. The problem is that for large N, you need more and more accuracy in your floating-point calculation.

An exact way to do the calculation is to use iteration:

a_0 = 1
b_0 = 2
a_{n+1} = b_n
b_{n+1} = a_n + b_n

The naive version of this is easy but slow.

def subsets(n, modulo):
    a, b = 1, 2
    for _ in xrange(n):
        a, b = b, (a + b) % modulo
    return a

Instead, a standard trick is to write the repeated application of the recurrences as a matrix power:

( a_n ) = | 0 1 |^N  ( 1 )
( b_n ) = | 1 1 |  . ( 2 )

You can compute the matrix power (using modulo-524287 arithmetic) by repeated squaring. See Exponentiation by squaring. Here's complete code:

def mul2x2(a, b, modulo):
    result = [[0, 0], [0, 0]]
    for i in xrange(2):
        for j in xrange(2):
            for k in xrange(2):
                result[i][j] += a[i][k] * b[k][j]
                result[i][j] %= modulo
    return result

def pow(m, n, modulo):
    result = [[1, 0], [0, 1]]
    while n:
        if n % 2: result = mul2x2(result, m, modulo)
        m = mul2x2(m, m, modulo)
        n //= 2
    return result

def subsets(n):
    m = pow([[0, 1], [1, 1]], n, 524287)
    return (m[0][0] + 2 * m[0][1]) % 524287

for i in xrange(1, 10):
    print i, subsets(i)
for i in xrange(1, 20):
    print i, subsets(10 ** i)

This prints solutions for every power of 10 up to 10^19, and it's effectively instant (0.041sec real on my laptop).

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118
  • This code is so good! I didn't even know you can approach this by matrices, and somehow you managed it to run really fast. Big props to you! – possibility0 Oct 19 '13 at 01:36
  • related: [nth fibonacci number in sublinear time](http://stackoverflow.com/q/1525521/4279), especially [this answer that also based on the matrix power](http://stackoverflow.com/a/1526036/4279). – jfs Oct 19 '13 at 20:42
  • [Here's another implementation of a fast Fibonacci recurrence.](http://stackoverflow.com/a/14782458/68063) – Gareth Rees Oct 30 '13 at 12:16