Why won't AnyEvent::child callbacks ever run if interval timer events are always ready?

Question

Update this issue can be resolved using the fixes present in https://github.com/zbentley/AnyEvent-Impl-Perl-Improved/tree/io-starvation

Context:

I am integrating AnyEvent with some otherwise-synchronous code. The synchronous code needs to install some watchers (on timers, child processes, and files), wait for at least one watcher to complete, do some synchronous/blocking/legacy stuff, and repeat.

I am using the pure-perl AnyEvent::Loop-based event loop, which is good enough for my purposes at this point; most of what I need it for is signal/process/timer tracking.

The problem:

If I have a callback that can block the event loop for a moment, child-process-exit events/callbacks never fire. The simplest example I could make watches a child process and runs an interval timer. The interval timer does something blocking before it finishes:

use AnyEvent;

# Start a timer that, every 0.5 seconds, sleeps for 1 second, then prints "timer":
my $w2 = AnyEvent->timer(
    after => 0,
    interval => 0.5,
    cb => sub {
        sleep 1; # Simulated blocking operation. If this is removed, everything works.
        say "timer";
    },
);

# Fork off a pid that waits for 1 second and then exits:
my $pid = fork();
if ( $pid == 0 ) {
    sleep 1;
    exit;
}

# Print "child" when the child process exits:
my $w1 = AnyEvent->child(
    pid => $pid,
    cb => sub {
        say "child";
    },
);

AnyEvent->condvar->recv;

This code leaves the child process zombied, and prints "timer" over and over, for "ever" (I ran it for several minutes). If the sleep 1 call is removed from the callback for the timer, the code works correctly and the child process watcher fires as expected.

I'd expect the child watcher to run eventually (at some point after the child exited, and any interval events in the event queue ran, blocked, and finished), but it does not.

The sleep 1 could be any blocking operation. It can be replaced with a busy-wait or any other thing that takes long enough. It doesn't even need to take a second; it appears to only need to be a) running during the child-exit event/SIGCHLD delivery, and b) result in the interval always being due to run according to the wallclock.

Questions:

Why isn't AnyEvent ever running my child-process watcher callback?

How can I multiplex child-process-exit events with interval events that may block for so long that the next interval becomes due?

What I've tried:

My theory is that timer events which become "ready" due to time spent outside of the event loop can indefinitely pre-empt other types of ready events (like child process watchers) somewhere inside AnyEvent. I've tried a few things:

Using AnyEvent::Strict doesn't surface any errors or change behavior in any way.
Partial solution: Removing the interval event at any point does make the child process watcher fire (as if there's some internal event polling/queue population done inside AnyEvent that only happens if there are no timer events already "ready" according to the wallclock). Drawbacks: in the general case that doesn't work, since I'd have to know when my child process had exited to know when to postpone my intervals, which is tautological.
Partial solution: Unlike child-process watchers, other interval timers seem to be able to multiplex with each other just fine, so I can install a manual call to waitpid in another interval timer to check for and reap children. Drawbacks: child-waiting can be artificially delayed (my use case involves lots of frequent process creation/destruction), any AnyEvent::child watchers that are installed and fire successfully will auto-reap the child and not tell my interval/waitpid timer, requiring orchestration, and it just generally feels like I'm misusing AnyEvent.

I don't know whether this is relevant, but the `AnyEvent` documentation says this *"This means you cannot create a child watcher as the very first thing in an AnyEvent program, you have to create at least one watcher before you fork the child"* — Borodin, Jun 12 '16 at 01:40
Nope, the behavior remains the same even if the timer watcher is created before the fork. I'll update my example to do that; thanks for the info! — Zac B, Jun 12 '16 at 03:01

score 2 · Accepted Answer · answered Jun 12 '16 at 16:06

2

The interval is the time between the start of each timer callback, i.e. not the time between the end of a callback and the start of the next callback. You setup a timer with interval 0.5 and the action for the timer is to sleep one second. This means that once the timer is triggered it will be triggered immediately again and again because the interval is always over after the timer returned.

Thus depending on the implementation of the event loop it might happen that no other events will be processed because it is busy running the same timer over and over. I don't know which underlying event loop you are using (check $AnyEvent::MODEL) but if you look at the source code of AnyEvent::Loop (the loop for the pure Perl implementation, i.e. model is AnyEvent::Impl::Perl) you will find the following code:

   if (@timer && $timer[0][0] <= $MNOW) {
      do {
         my $timer = shift @timer;
         $timer->[1] && $timer->[1]($timer);
      } while @timer && $timer[0][0] <= $MNOW;

As you can see it will be busy executing timers as long as there are timers which need to run. And with your setup of the interval (0.5) and the behavior of the timer (sleep one second) there will always be a timer which needs to be executed.

If you instead change your timer so that there is actual room for the processing of other events by setting the interval to be larger than the blocking time (like 2 seconds instead of 0.5) everything works fine:

...
interval => 2,
cb => sub {
    sleep 1; # Simulated blocking operation. Sleep less than the interval!!
    say "timer";


...
timer
child
timer
timer

answered Jun 12 '16 at 16:06

Steffen Ullrich

114,247
10
131
172

That answer seems correct, but also to show that this event loop is really, really broken. What if the blocking operation in the timer callback could take a variable amount of time? Then there's a chance that other event emitters could never fire because things pre-empt them; there's no underlying queue. Sigh. Marking as accepted, with sadness :p – Zac B Jun 13 '16 at 12:25
@ZacB: The child exit event is not lost, it only does not run because you are busy doing other stuff. If you stop doing other stuff the event gets delivered. Its not a problem of the event loop but that you are asking for the impossible, i.e. executing every 500ms a task which takes one second in a single-threaded application. It's like trying to buy goods for $100 when you only have $50. – Steffen Ullrich Jun 13 '16 at 13:17
It seems more reasonable to have interval timers work on a "at most once every X seconds" model, and queue events internally. That is, the event loop run() function should check timers, and push callbacks into a queue (signals etc., when they fire, should also push callbacks onto a queue). After that initial "ready event detection" step is run, callbacks should be popped and run from the queue in FIFO order. – Zac B Jun 13 '16 at 13:26
@ZacB: Changing the loop would make it behave differently but there is still no way to buy all the goods for $100 if you only have $50. All what you've changed is the selection of goods you can buy. And what might be a more reasonable selection for you is still a behavior change and this might affect others in a negative way. Again: the problem is not the event loop but that you are asking for the impossible. – Steffen Ullrich Jun 13 '16 at 14:04
I don't think it's as simple as you say; a queued model is just a different set of guarantees for event delivery. An interval doesn't have to mean "run this roughly every $duration even at the expense of other pending events_"; it can also mean "run this every $duration unless there are pending events". It's possible to simulate this by deferring AnyEvent intervals until the next event run; I'll put that in a separate answer (yours is still the correct one). – Zac B Jun 15 '16 at 19:47
@ZacB: your code explicitly says that it should execute every 0.5 seconds a tasks which runs 1 second, that's how interval is defined. Additionally you wait for a child exit. Whatever the event loop does - it will not be possible to reach all the given goals in a single threaded application and thus the implementation needs to be unfair against some tasks. You are complaining that the implementation deals with this impossible task unfair in the wrong way because you prefer it to be unfair in another way. In my opinion it is a bad idea to ask for impossible things at all. – Steffen Ullrich Jun 15 '16 at 20:07
It's perfectly possible to make event loops that deal with this scenario. In fact, I'd go so far as to say that most event loops deal with it properly. No interval fires exactly on time; all computations have an overhead. If that overhead is unpredictable, the event loop shouldn't starve random emitters. – Zac B Jun 15 '16 at 20:29
Most JavaScript engines (some of the most reliability-critical event loops in existence) use queues behind the call(back) stacks they present to users to deal with this exact situation (user callbacks that can block the loop for an arbitrary amount of time). See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop http://stackoverflow.com/questions/21607692/understanding-the-event-loop – Zac B Jun 15 '16 at 20:29
@ZacB: again, the child exit event is **not lost**. Once you've stopped the timers the event will be delivered. And an event queue will just be differently unfair and in the worst case just grow longer and longer or just skip the wrong kind of events. Their simply is no fairness if you are overcommitting resources, just different kinds of unfairness. If you don't want an interval timer but instead reestablish the timer after the last timer finished then you should exactly do this and not use an interval timer. – Steffen Ullrich Jun 15 '16 at 20:40

Zac B · Answer 2 · 2016-09-06T19:09:59.380

0

Update this issue can be resolved using the fixes present in https://github.com/zbentley/AnyEvent-Impl-Perl-Improved/tree/io-starvation

@steffen-ulrich's answer is correct, but points out a very flawed behavior in AnyEvent: since there is no underlying event queue, certain kinds of events that always report "ready" can indefinitely pre-empt others.

Here is a workaround:

For interval timers that are always "ready" due to a blocking operation that happens outside of the event loop, it is possible to prevent starvation by chaining interval invocations onto the next run of the event loop, like this:

use AnyEvent;

sub deferred_interval {
    my %args = @_;
    # Some silly wrangling to emulate AnyEvent's normal
    # "watchers are uninstalled when they are destroyed" behavior:
    ${$args{reference}} = 1;
    $args{oldref} //= delete($args{reference});
    return unless ${$args{oldref}};

    AnyEvent::postpone {
        ${$args{oldref}} = AnyEvent->timer(
            after => delete($args{after}) // $args{interval},
            cb => sub {
                $args{cb}->(@_);
                deferred_interval(%args);
            }
        );
    };

    return ${$args{oldref}};
}

# Start a timer that, at most once every 0.5 seconds, sleeps
# for 1 second, and then prints "timer":
my $w1; $w1 = deferred_interval(
    after => 0.1,
    reference => \$w2,  
    interval => 0.5,
    cb => sub {
        sleep 1; # Simulated blocking operation.
        say "timer";
    },
);

# Fork off a pid that waits for 1 second and then exits:
my $pid = fork();
if ( $pid == 0 ) {
    sleep 1;
    exit;
}

# Print "child" when the child process exits:
my $w1 = AnyEvent->child(
    pid => $pid,
    cb => sub {
        say "child";
    },
);

AnyEvent->condvar->recv;

Using that code, the child process watcher will fire more or less on time, and the interval will keep firing. The tradeoff is that each interval timer will only start after each blocking callback finishes. Given an interval time of I and a blocking-callback runtime of B, this approach will fire an interval event roughly every I + B seconds, and the previous approach from the question will take min(I,B) seconds (at the expense of potential starvation).

I think that a lot of the headaches here could be avoided if AnyEvent had a backing queue (many common event loops take this approach to prevent situations exactly like this one), or if the implementation of AnyEvent::postpone installed a "NextTick"-like event emitter to be fired only after all other emitters had been checked for events.

edited Sep 06 '16 at 19:09

answered Jun 15 '16 at 20:22

Zac B

3,796
3
35
52

1

I think this should be a call to `deferred_interval` instead if the undefined `chain_interval`. But in any case this is no longer an interval timer (i.e. fixed difference between timers), but a timer which gets established after the original callback was finished. And this way you don't work around "a very flawed behavior" but you simply don't expect impossible things from the event loop any longer. – Steffen Ullrich Jun 15 '16 at 20:51
Fixed, thanks! Again (though we have probably beaten this one to death in the question comments), I think that predictable "unfairness" (really just fair callback scheduling for fully utilized event loops) is vastly preferable compared to starvation of other event emitters based on emitter *type*. When running arbitrary user code, I think that a robust event loop implementation must do predictable things even in the face of unexpected blockage; queueing is one way around that. Explicit emitter precedence (really just priority queueing) is another. Potentially-indefinite starvation is not. – Zac B Jun 15 '16 at 21:10
For an example of what I mean, try running something like this via Node (which has a queueing event loop): http://pastebin.com/AR09NqXw The loop is still fully utilized (the blocking time for callbacks is greater than the width of any timers installed), but interval queues don't "fill up" and leak memory, and signals/immediate events/etc are still dequeued and run in FIFO order. – Zac B Jun 15 '16 at 22:01
1

Since your want continue with this discussion... Again, you are expecting a specific kind of unfairness probably because you are used to exactly this kind of unfairness. But in my opinion the main problem is still that you are asking for the impossible, i.e. the program should not require the event loop to be **permanently** unfair. You current solution fixes this by not asking for the impossible anymore which is the right approach. Again, the child exit event is not lost. It is only not delivered as long as you are are keeping the loop busy with other tasks. – Steffen Ullrich Jun 16 '16 at 05:17
I understand that it's not lost; that's why I'm referring to "starvation", not "loss". All event loops are, by your definition, permanently unfair: if two timers are (in the same tick) scheduled to run 1 second in the future, one of the two callbacks will have to wait for the other to complete, so it won't fire on time; that's "unfair". Predictable unfairness is much more useful than unpredictable unfairness; AnyEvent handles the two-timers case correctly (fires one then the other), but some other emitter types (signals) are not handled the same way. That's inconsistent and dangerous. – Zac B Jun 16 '16 at 14:14
One of the benefits of event loops versus exact scheduling (threads) is that they can *multiplex* lots of callbacks, even when fully utilized/never blocking waiting for events. A multiplexer is going to have some amount of inherent unfairness, and it should minimize that (e.g. by using an internal clock so that deferred timers don't wait for `user time + timer time`, which is why my workaround is a poor solution), but some degree of predictability or fairness is key. Many event loops are designed to run at full utilization: events are ready every time user callbacks return control to the loop. – Zac B Jun 16 '16 at 14:20

Why won't AnyEvent::child callbacks ever run if interval timer events are always ready?

2 Answers2