why is this memoized Euler14 implementation so much slower in Raku than Python?

Question

I was recently playing with problem 14 of the Euler project: which number in the range 1..1_000_000 produces the longest Collatz sequence?

I'm aware of the issue of having to memoize to get reasonable times, and the following piece of Python code returns an answer relatively quickly using that technique (memoize to a dict):

#!/usr/bin/env python

L = 1_000_000
cllens={1:1}

cltz = lambda n: 3*n + 1 if n%2 else n//2

def cllen(n):
    if n not in cllens: cllens[n] = cllen(cltz(n)) + 1
    return cllens[n]

maxn=1
for i in range(1,L+1):
    ln=cllen(i)
    if (ln > cllens[maxn]): maxn=i

print(maxn)

(adapted from here; I prefer this version that doesn't use max, because I might want to fiddle with it to return the longest 10 sequences, etc.).

I have tried to translate it to Raku staying as semantically close as I could:

#!/usr/bin/env perl6
use v6;

my $L=1_000_000;
my %cllens = (1 => 1);

sub cltz($n) { ($n %% 2) ?? ($n div 2) !! (3*$n+1) }

sub cllen($n) {
    (! %cllens{$n}) && (%cllens{$n} = 1+cllen($n.&cltz));
    %cllens{$n};
}

my $maxn=1;
for (1..$L) {
    my $ln = cllen($_);
    ($ln > %cllens{$maxn}) && ($maxn = $_)
}
say $maxn

Here are the sorts of times I am consistently getting running these:

$ time <python script>
837799

real    0m1.244s
user    0m1.179s
sys     0m0.064s

On the other hand, in Raku:

$ time <raku script>
837799

real    0m21.828s
user    0m21.677s
sys     0m0.228s

Question(s)

Am I mistranslating between the two, or is the difference an irreconcilable matter of starting up a VM, etc.?

Are there clever tweaks / idioms I can apply to the Raku code to speed it up considerably past this?

Aside

Naturally, it's not so much about this specific Euler project problem; I'm more broadly interested in whether there are any magical speedup arcana appropriate to Raku I am not aware of.

Perhaps the *best* way to get a Raku "magical speedup" is to translate it into Python :-) Sorry, couldn't resist, and it's been so long since I used Perl, I probably can't provide a *useful* answer, other than to possibly insert timing traps into your code to see where the delay is. — paxdiablo, Nov 16 '20 at 00:41
:-) Fair enough, but it doesn't detract from the intrinsic interest the question of what's actually happening here holds for me. — grobber, Nov 16 '20 at 00:42
grobber, agreed, I wasn't stating your question was invalid, it's actually quite good. — paxdiablo, Nov 16 '20 at 00:43
@paxdiablo, .@grobber "Perhaps the *best* way to get a Raku "magical speedup" is to translate it into Python ... I probably can't provide a *useful* answer". :-) You have! :) Raku(do) supports a remarkable *automatic* translation system. One just *directly* uses existing (or new) foreign language code (no need for *manual* translation) via an [`Inline`](https://modules.raku.org/search/?q=inline). Use foreign features as if they were native to Raku! All data, exceptions etc. are automatically marshalled. If you click the link you'll see an `Inline::Python` that's being developed. Maybe try it? — raiph, Nov 16 '20 at 13:41
Might see improvements using 'use experimental :cached' feature. — Coke, Nov 16 '20 at 16:40

score 7 · Accepted Answer · answered Nov 16 '20 at 03:56

7

I think the majority of the extra time is because Raku has type checks, and they aren't getting removed by the runtime type specializer. Or if they are getting removed it is after a significant amount of time.

Generally the way to optimize Raku code is first to run it with the profiler:

$ raku --profile test.raku

Of course that fails with a Segfault with this code, so we can't use it.

My guess would be that much of the time is related to using the Hash.

If it was implemented, using native ints for the key and value might have helped:

 my int %cllens{int} = (1 => 1);

Then declaring the functions as using native-sized ints could be a bigger win.
(Currently this is a minor improvement at best.)

sub cltz ( int $n --> int ) {…}
sub cllen( int $n --> int ) {…}
for (1..$L) -> int $_ {…}

Of course like I said native hashes aren't implemented, so that is pure speculation.

You could try to use the multi-process abilities of Raku, but there may be issues with the shared %cllens variable.

The problem could also be because of recursion. (Combined with the aforementioned type checks.)

If you rewrote cllen so that it used a loop instead of recursion that might help.

Note:
The closest to n not in cllens is probably %cllens{$n}:!exists.
Though that might be slower than just checking that the value is not zero.

Also cellen is kinda terrible. I would have written it more like this:

sub cllen($n) {
    %cllens{$n} //= cllen(cltz($n)) + 1
}

answered Nov 16 '20 at 03:56

Brad Gilbert

33,846
11
78
129

2

Thanks! RE: your alternate `cllen`: I would probably have written that too! :-) I use `//=` frequently, but wasn't going for elegance, tried the most straightforward translation, and posted that. – grobber Nov 16 '20 at 05:46
2

Hi @grobber. "posted ... most straightforward translation" I think that helps. I've bookmarked this high quality SO. "I would probably have written that too! :-)" Me three. :) It was the first thing I tried last night. "wasn't going for elegance". One reason I use `op=` is aesthetic, but it's also generally a lot faster. It halved the run time for me. I chose to pass on commenting because it was the only success I had, and halving 20X slower to 10X slower than python didn't feel like "idioms [plural] I can apply to the Raku code to speed it up **considerably**", nor "magical speedup arcana". – raiph Nov 16 '20 at 12:41
@BradGilbert "Of course [`raku --profile test.raku`] fails with a Segfault with this code, so we can't use it." Of course, I can't tell whether your "Of course" means something like "Ironically" or if there's something obvious I'm missing. :) Got any idea why? Is it because `$L` is set to a million? What if it's dropped to, say, `10`? (I'm not able to access a system with a dev setup at the moment or I'd try myself.) – raiph Nov 16 '20 at 13:50
@raiph I meant it to sound exasperated. As in “Of course it doesn't work, because that would actually be useful.” – Brad Gilbert Nov 16 '20 at 13:53
@BradGilbert Thanks. I hear you about exasperation. In the hope that my comments aren't fuelling it I'll continue. I haven't used the profiler much (and can't at all at the moment), but in the past it's typically worked for me for small runs and not for big ones. Hence my thought about dropping `$L` to a small number to see if that works and provides any insight. – raiph Nov 16 '20 at 14:34
1

@raiph: thanks! A moderate speedup was my experience too on this same machine by changing `cllen` exactly as suggested by @Brad Gilbert (not by a factor of 2 though; running time went down to just under 17s). That alone is good to know: it's always pleasant to learn that aesthetic and performance align. – grobber Nov 16 '20 at 16:07

why is this memoized Euler14 implementation so much slower in Raku than Python?

1 Answers1