Efficiently generating Stern's Diatomic Sequence

Question

Stern's Diatomic Sequence can be read about in more details over here; however, for my purpose I will define it now.

Definition of Stern's Diatomic Sequence

Let n be a number to generate the fusc function out of. Denoted fusc(n).

If n is 0 then the returned value is 0.
If n is 1 then the returned value is 1.

If n is even then the returned value is fusc(n / 2).
If n is odd then the returned value is fusc((n - 1) / 2) + fusc((n + 1) / 2).

Currently, my Python code brute forces through most of the generation, other than the dividing by two part since it will always yield no change.

def fusc (n):
    if n <= 1:
        return n

    while n > 2 and n % 2 == 0:
        n /= 2

    return fusc((n - 1) / 2) + fusc((n + 1) / 2)

However, my code must be able to handle digits in the magnitude of ~~1000s~~ millions of bits, and recursively running through the function ~~thousands~~ millions of times does not seem very efficient or practical.

Is there any way I could algorithmically improve my code such that massive numbers can be passed through without having to recursively call the function so many times?

Read about memoization: https://en.wikipedia.org/wiki/Memoization It can be wonderfuly done with python, but right now I am not sure whether it is a good solution here, as you won't calculate same thing too many times... — Filip Malczak, Dec 19 '16 at 18:57
Memoization might help slightly once descending into smaller numbers, but I need to be able to compute large numbers off the bat incredibly quickly, although implementing something like this would definitely improve long-time performance, it is not really what I am looking for. — Ziyad Edher, Dec 19 '16 at 18:59
That's why it became a comment instead of an answer. Good luck. — Filip Malczak, Dec 19 '16 at 19:01

score 9 · Answer 1 · edited Jun 20 '20 at 09:12

With memoization for a million bits, the recursion stack would be extremely large. We can first try to look at a sufficiently large number which we can work by hand, fusc(71) in this case:

fusc(71) = fusc(35) + fusc(36)

fusc(35) = fusc(17) + fusc(18)
fusc(36) = fusc(18)

fusc(71) = 1 * fusc(17) + 2 * fusc(18)

fusc(17) = fusc(8) + fusc(9)
fusc(18) = fusc(9)

fusc(71) = 1 * fusc(8) + 3 * fusc(9)

fusc(8) = fusc(4)
fusc(9) = fusc(4) + fusc(5)

fusc(71) = 4 * fusc(4) + 3 * fusc(5)

fusc(4) = fusc(2)
fusc(3) = fusc(1) + fusc(2)

fusc(71) = 7 * fusc(2) + 3 * fusc(3)

fusc(2) = fusc(1)
fusc(3) = fusc(1) + fusc(2)

fusc(71) = 11 * fusc(1) + 3 * fusc(2)

fusc(2) = fusc(1)

fusc(71) = 14 * fusc(1) = 14

We realize that we can avoid recursion completely in this case as we can always express fusc(n) in the form a * fusc(m) + b * fusc(m+1) while reducing the value of m to 0. From the example above, you may find the following pattern:

if m is odd:
a * fusc(m) + b * fusc(m+1) = a * fusc((m-1)/2) + (b+a) * fusc((m+1)/2)
if m is even:
a * fusc(m) + b * fusc(m+1) = (a+b) * fusc(m/2) + b * fusc((m/2)+1)

Therefore, you may use a simple loop function to solve the problem in O(lg(n)) time

def fusc(n):
  if n == 0: return 0
  a = 1
  b = 0
  while n > 0:
    if n%2:
      b = b + a
      n = (n-1)/2
    else:
      a = a + b
      n = n/2
  return b

hiro protagonist · Accepted Answer · 2016-12-20T17:24:11.197

lru_cache works wonders in your case. make sure maxsize is a power of 2. may need to fiddle a bit with that size for your application. cache_info() will help with that.

also use // instead of / for integer division.

from functools import lru_cache


@lru_cache(maxsize=512, typed=False)
def fusc(n):

    if n <= 1:
        return n

    while n > 2 and n % 2 == 0:
        n //= 2

    return fusc((n - 1) // 2) + fusc((n + 1) // 2)

print(fusc(1000000000078093254329870980000043298))
print(fusc.cache_info())

and yes, this is just meomization as proposed by Filip Malczak.

you might gain an additional tiny speedup using bit-operations in the while loop:

while not n & 1:  # as long as the lowest bit is not 1
    n >>= 1       # shift n right by one

UPDATE:

here is a simple way of doing meomzation 'by hand':

def fusc(n, _mem={}):  # _mem will be the cache of the values 
                       # that have been calculated before
    if n in _mem:      # if we know that one: just return the value
        return _mem[n]

    if n <= 1:
        return n

    while not n & 1:
        n >>= 1
    if n == 1:
        return 1    

    ret = fusc((n - 1) // 2) + fusc((n + 1) // 2)
    _mem[n] = ret  # store the value for next time
    return ret

UPDATE

after reading a short article by dijkstra himself a minor update.

the article states, that f(n) = f(m) if the fist and last bit of m are the same as those of n and the bits in between are inverted. the idea is to get n as small as possible.

that is what the bitmask (1<<n.bit_length()-1)-2 is for (first and last bits are 0; those in the middle 1; xoring n with that gives m as described above).

i was only able to do small benchmarks; i'm interested if this is any help at all for the magitude of your input... this will reduce the memory for the cache and hopefully bring some speedup.

def fusc_ed(n, _mem={}):

    if n <= 1:
        return n

    while not n & 1:
        n >>= 1
    if n == 1:
        return 1

    # https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/EWD578.html
    # bit invert the middle bits and check if this is smaller than n
    m = n ^ (1<<n.bit_length()-1)-2
    n = m if m < n else n

    if n in _mem:
        return _mem[n]

    ret = fusc(n >> 1) + fusc((n >> 1) + 1)
    _mem[n] = ret
    return ret

i had to increase the recursion limit:

import sys
sys.setrecursionlimit(10000)  # default limit was 1000

benchmarking gave strange results; using the code below and making sure that i always started a fresh interperter (having an empty _mem) i sometimes got significantly better runtimes; on other occasions the new code was slower...

benchmarking code:

print(n.bit_length())

ti = timeit('fusc(n)', setup='from __main__ import fusc, n', number=1)
print(ti)

ti = timeit('fusc_ed(n)', setup='from __main__ import fusc_ed, n', number=1)
print(ti)

and these are three random results i got:

6959
24.117448464001427
0.013900151001507766

6989
23.92404893300045
0.013844672999766772

7038
24.33894686200074
24.685758719999285

that is where i stopped...

Unfortunately, I am using Python2.7, for which `lru_cache` is not available. I should have added Python2.7 in the tags. — Ziyad Edher, Dec 19 '16 at 19:15
http://stackoverflow.com/a/11861795/4954037 might help there... or you could roll your own. — hiro protagonist, Dec 19 '16 at 19:18
Cannot use any external libraries either. I think the solution needs to be nearly completely algorithmic. — Ziyad Edher, Dec 19 '16 at 19:22
added a meomized version without using `lru_cache`. please try it before discarding the idea of caching values. the speedup is enormous! — hiro protagonist, Dec 19 '16 at 19:24
Your implementation greatly increased the speed of my application; however, I incorrectly reported that my numbers will be in the thousands of bits, while in fact they turned out to be in the millions. To process just one test case I am needing a great deal over 12 seconds, which is the time limit of the processing. — Ziyad Edher, Dec 19 '16 at 19:31
@Wintro: a small update... but don't get your hopes to high... sorry. — hiro protagonist, Dec 20 '16 at 12:25
Wow, thank you! I noticed that article but felt like it was beyond the scope of what I was trying to achieve. It _dramatically_ improved the speed when calculating massive digits. For a million-bit digit, it reduced run time from over 30 seconds to just 4 seconds. — Ziyad Edher, Dec 20 '16 at 13:56
I think https://stackoverflow.com/a/44557597/1400793 should be the accepted answer. The code in the other answer contains no recursion, is clearly O(log n), and greatly simpler. — Paul Hankin, Apr 24 '18 at 20:39
@PaulHankin i agree! this is a great answer! (did not see it before as it was months after mine...). — hiro protagonist, Apr 25 '18 at 06:23

Efficiently generating Stern's Diatomic Sequence

Definition of Stern's Diatomic Sequence

2 Answers2

Linked