Potentially Normal Iterator Behaviour which seems like "Weird behaviour in Python Itertools Monte Carlo using itertools.takewhile"

Question

Edit after slothrop's comment:

It seems that takewhile eats up the last checked object in iterators (see comment) which means there is no telling if the next element will also satisfy the condition, which makes calculation problematic. Using takewhile on an iterator of infinitely many random realisations seems to be not the best idea (if the realisations have to stay in tact).

End Edit

Problem:

I wrote a monte carlo simulation twice, once as itertool implementation, once in an orthodox style. The itertool one has some weird behaviour and from what I can tell not the right results.... the other one should work fine...

I wonder why this is, apolegies for the large code.

Idea:

My guess is that the takewhile function has some weird behaviour when mixed with RVs. Notice in the simulate_freqs_iter function there are other specifications commented out, (e.g. zip(takewhile(lambda x: x < t, honest_arrival_times), attack_arrival_times)) they yield significantly different results still, which I cannot explain to myself...

Before you ask - I'm aware that n=1e5 is too small, I was aiming for n=1e8 or n=1e9, but I won't expect you to run the programm for 20h. With 1e5 it will be done in 10 seconds or so, and the difference is already very stark!

I would be very greatful for an anwer as I have fallen in love with itertools a bit too much (the itertool implementation is not even faster or efficient in any other way...).

Code:

from random import expovariate, seed
from itertools import accumulate, takewhile, count, islice, chain

seed("31.05.2023")

def simulate_freqs_iter(k:int, mu:float, lam:float, n:int, conditional_on_finite:bool=True, verbose:bool=False) -> dict:
    """Simulates n monte-carlo runs of a slow poisson process catching up with a fast one
    returns a dictionary containing the empirical cdf for t={10, 11, ..., 50}"""
    abs_freqs = {t: 0 for t in range(10, 51)}
    for i in range(n):
        # Waiting times for each poisson process
        honest_arrival_times = accumulate(((expovariate(mu)) for _ in count(start=0, step=1)))
        attack_arrival_times = islice( # attack_arrival_times is k blocks behind, but only needs to catch up, not surpass
            accumulate(((expovariate((lam))) for _ in count(start=0, step=1))), k-1, None)

        # Optional printing function
        if verbose and (i%50000)==0:
            first_10_honest = list(islice(honest_arrival_times, 10))
            first_10_attack = list(islice(attack_arrival_times, 10))
            print(list(zip(first_10_honest, first_10_attack)))
            honest_arrival_times = chain(first_10_honest, honest_arrival_times)
            attack_arrival_times = chain(first_10_attack, attack_arrival_times)

        # Simulates poisson waiting times and counts number of hits where 1 catches up with 2
        for t in abs_freqs.keys():
            # Every time N_2 surpasses N_1 we have a hit
            hits = list((t, honest, attack) for honest, attack in 
                    # Take all jump times of N_1 and see if N_2 was there before N_1 within t
                        # takewhile(lambda x: min(x[0], x[1]) < t, zip(honest_arrival_times, attack_arrival_times))
                        # zip(takewhile(lambda x: x < t, honest_arrival_times), attack_arrival_times)
                        zip(honest_arrival_times, takewhile(lambda x: x < t, attack_arrival_times))
                        # zip(takewhile(lambda x: x < t, honest_arrival_times), takewhile(lambda x: x < t, attack_arrival_times))
                        if honest > attack)
            if hits:
                if verbose: print(hits[0])
                abs_freqs[t]+=1
                # Don't continue this path once found a hit
                break
    
    
    # Condition on the probability that it ever catches up
    prob_finite = min(((lam/mu)**k), 1) if conditional_on_finite else 1
    
    # Accumulate relative frequencies to get cdf    
    results = {key:((value/n) / prob_finite) for key, value in dict(
        accumulate(abs_freqs.items(), lambda prev,curr: (curr[0], (curr[1]+prev[1])))).items()}
    return results


def simulate_freqs_orthodox(k:int, mu:float, lam:float, n:int, conditional_on_finite:bool=True, verbose:bool=False) -> dict:
    """Simulates n monte-carlo runs of a slow poisson process catching up with a fast one
    returns a dictionary containing the empirical cdf for t={10, 11, ..., 50}"""
    abs_freqs = {t: 0 for t in range(10, 51)}

    # We run the monte carlo n times
    for _ in range(n):
        # Initialise first block arrival
        next_honest_arrival = expovariate(mu)
        next_attack_arrival = expovariate(lam)

        # the attacker chain is behind k blocks
        for _ in range(k-1):
            next_attack_arrival += expovariate(lam)

        for t in abs_freqs.keys():
            while next_attack_arrival < t:
                # count when a double spend attack is successful
                if next_honest_arrival > next_attack_arrival:
                    abs_freqs[t] += 1
                    if verbose: print(next_honest_arrival, next_attack_arrival)
                    # stop searching for all t on this path once found a hit
                    break

                # Add another block to each chain
                next_honest_arrival += expovariate(mu)
                next_attack_arrival += expovariate(lam)
            else:
                # If while-loop terminates without a hit, continue with greater t
                continue
            # Terminate search for this path after a successful hit
            break
    
    # Condition on the probability that it ever catches up
    prob_finite = min(((lam/mu)**k), 1) if conditional_on_finite else 1
    
    # Accumulate relative frequencies to get cdf    
    results = {key:((value/n) / prob_finite) for key, value in dict(
        accumulate(abs_freqs.items(), lambda prev,curr: (curr[0], (curr[1]+prev[1])))).items()}
    return results

# Monte Carlo simulation of probabilities for T_k
k=10
mu=7/10
lam=3/10
n=int(1e5)

print(simulate_freqs_iter(k=k, mu=mu, lam=lam, n=n, conditional_on_finite=True, verbose=False))
print(simulate_freqs_orthodox(k=k, mu=mu, lam=lam, n=n, conditional_on_finite=True, verbose=False))

Don't know if this is the cause, but note that `takewhile` "loses" the element that fails its condition. For example: `it = iter([1, 4, 6, 4, 1]); l1 = list(takewhile(lambda x: x<5, it)); l2 = list(it)`. The element `6` ends up in neither `l1` nor `l2` there. — slothrop, May 31 '23 at 09:59
Uff, you're right - that will mess up the probability bounds... thank you for clarifying this... I don't see an easy way to fix it as iterators are so fleeting. Teeing doesn't seem to work with random values ... the resulting realisations will still be different... To just add a sufficiently large realisation I would have to know what is sufficiently large and kill the iterator ... Thanks for the anwer! — WonderfulWonder, May 31 '23 at 13:51
@WonderfulWonder you can write your own `takewhile` generator function easily if you want to customize how it works. I find that the built-in `itertools` aren't often that useful and I end up writing mostly my own generator functions in cases like this. — shadowtalker, May 31 '23 at 14:14