1

I have a chain of many NSBlockOperations with dependencies. If one operation early in the chain fails - I want the other operations to not run. According to docs, this should be easy to do from the outside - if I cancel an operation, all dependent operations should automatically be cancelled.

However - if only the execution-block of my operation "knows" that it failed, while executing - can it cancel its own work?

I tried the following:

    NSBlockOperation *op = [[NSBlockOperation alloc] init];
    __weak NSBlockOperation *weakOpRef = op;
    [takeScreenShot addExecutionBlock:^{
        LOGInfo(@"Say Cheese...");
        if (some_condition == NO) { // for some reason we can't take a photo
            [weakOpRef cancel];
            LOGError(@"Photo failed");
        }
        else {
            // take photo, process it, etc.
            LOGInfo(@"Photo taken");
        }
    }];

However, when I run this, other operations dependent on op are executed even though op was cancelled. Since they are dependent - surely they're not starting before op finished, and I verified (in debugger and using logs) that isCancelled state of op is YES before the block returns. Still the queue executes them as if op finished successfully.

I then further challenged the docs, like thus:

    NSOperationQueue *myQueue = [[NSOperationQueue alloc] init];
    
    NSBlockOperation *op = [[NSBlockOperation alloc] init];
    __weak NSBlockOperation *weakOpRef = takeScreenShot;
    [takeScreenShot addExecutionBlock:^{
        NSLog(@"Say Cheese...");
        if (weakOpRef.isCancelled) { // Fail every once in a while...
            NSLog(@"Photo failed");
        }
        else {
            [NSThread sleepForTimeInterval:0.3f];
            NSLog(@"Photo taken");
        }
    }];
    
    NSOperation *processPhoto = [NSBlockOperation blockOperationWithBlock:^{
        NSLog(@"Processing Photo...");
        [NSThread sleepForTimeInterval:0.1f]; // Process  
        NSLog(@"Processing Finished.");
    }];
    
    // setup dependencies for the operations.
    [processPhoto addDependency: op];
    [op cancel];    // cancelled even before dispatching!!!
    [myQueue addOperation: op];
    [myQueue addOperation: processPhoto];
    
    NSLog(@">>> Operations Dispatched, Wait for processing");
    [eventQueue waitUntilAllOperationsAreFinished];
    NSLog(@">>> Work Finished");

But was horrified to see the following output in the log:

2020-11-05 16:18:03.803341 >>> Operations Dispatched, Wait for processing
2020-11-05 16:18:03.803427 Processing Photo...
2020-11-05 16:18:03.813557 Processing Finished.
2020-11-05 16:18:03.813638+0200 TesterApp[6887:111445] >>> Work Finished

Pay attention: the cancelled op was never run - but the dependent processPhoto was executed, despite its dependency on op.

Ideas anyone?

Motti Shneor
  • 2,095
  • 1
  • 18
  • 24
  • Sounds familiar, like https://stackoverflow.com/questions/64671948/how-to-stop-nsoperationqueue-during-dispatch-async maybe? PS there I give a sample of how to do this. In that answer we had a lot of discussion and eventually I edited my answer and changed it quite a bit, but what you want to do I think is covered in the first part of my answer together with a sample implementation. – skaak Nov 05 '20 at 13:35
  • PPS : You have right idea but need to sync your condition and also use your own logic. – skaak Nov 05 '20 at 13:41
  • I read the reference carefully, and my issue is VERY different. First and foremost - there is nothing asynchronous in my code except for the actual NSOperations run by the concurrent NSOperationQueue. Also - I don't mix GCD and NSOperation APIs. Next - I'm not asking how to IMPLEMENT actual cancellation - but rather about THE EFFECT of cancelling operations - which (to my understanding) doesn't work as advertised (or maybe someone can show me it is) – Motti Shneor Nov 05 '20 at 17:34

2 Answers2

2

OK. I think I solved the mystery. I just misunderstood the [NSOperation cancel] documentation.

it says:

In macOS 10.6 and later, if an operation is in a queue but waiting on unfinished dependent operations, those operations are subsequently ignored. Because it is already cancelled, this behavior allows the operation queue to call the operation’s start method sooner and clear the object out of the queue. If you cancel an operation that is not in a queue, this method immediately marks the object as finished. In each case, marking the object as ready or finished results in the generation of the appropriate KVO notifications.

I thought if operation B depends on operation A - it implies that if A is canceled (hence - A didn't finish its work) then B should be cancelled as well, because semantically it can't start until A completes its work.

Apparently, that was just wishful thinking...

What documentation says is different. When you cancel operation B (which depends on operation A), then despite being dependent on A - it won't wait for A to finish before it's removed from the queue. If operation A started, but hasn't finished yet - canceling B will remove it (B) immediately from the queue - because it will now ignore dependencies (the completion of A).

Soooo... to accomplish my scheme, I will need to introduce my own "dependencies" mechanism. The straightforward way is by introducing a set of boolean properties like isPhotoTaken, isPhotoProcessed, isPhotoColorAnalyzed etc. Then, an operation dependent on these pre-processing actions, will need to check in its preamble (of execution block) whether all required previous operations actually finished successfully, else cancel itself.

However, it may be worth subclassing NSBlockOperation, overriding the logic that calls 'start' to skip to finished if any of the 'dependencies' has been cancelled!

Initially I thought this is a long shot and may be hard to implement, but fortunately, I wrote this quick subclass, and it seems to work fine. Of course deeper inspection and stress tests are due:

@interface MYBlockOperation : NSBlockOperation {
}
@end

@implementation MYBlockOperation
- (void)start {
    if ([[self valueForKeyPath:@"dependencies.@sum.cancelled"] intValue] > 0)
        [self cancel];
    [super start];
}
@end

When I substitute NSBlockOperation with MYBlockOperation in the original question (and my other tests, the behaviour is the one I described and expected.

Motti Shneor
  • 2,095
  • 1
  • 18
  • 24
  • Yes this is what I've been trying to communicate. But I do not think it is a good idea to subclass as you mention - also see my other comments and answer really to see how I suggest you solve this. But that is my opinion so let me know how you do it. – skaak Nov 05 '20 at 19:28
  • I think this is maybe the simplest and most direct way to communicate my idea of dependency in code... if any of the operations I'm depending on was canceled - cancel myself just before start (then start in its gracious way, jumps to "finished" immediately because I'm cancelled. and so - long chains will retain the behavior as well. – Motti Shneor Nov 05 '20 at 19:31
  • If you have to schedule all the blocks from the start then the way in which you do the aggregate key value expression is pretty neat. – skaak Nov 06 '20 at 06:28
  • But note you've got one T too many in your implementation line ... – skaak Nov 06 '20 at 06:44
  • Thanks, I removed the spare T. I need to stack up tens of thousands of operations for batch-processing thousands of photos. Each photo goes through 5-15 processing steps, which are slightly different for each photo (depending on results of initial steps). You can see using the NSOperationQueue "graph" of NSOperations extensively. I have 5 different queues, with different priorities and QOS's, must be pause/resume each one independently, so... for me "dependency" is not just setting the order of execution, but more "depend" on the results of previous NSOperations. – Motti Shneor Nov 28 '20 at 19:37
  • This is such an interesting problem. Just a thought. In stead of taking each photo through the whole process, rather have a queue for each of the 15 processing steps. Then submit each photo to the first step / queue / "server". Then in that step decide what to do next and submit the photo to the next step / queue / "server" and so on. Then the dependencies are built into the processing steps themselves and you do not really have this complex and external dependency engine but more of an internal "what to do next" logic built into each step. – skaak Nov 29 '20 at 09:17
  • This, too, is being done when result of one step changes the flow. However, where I know in advance what needs be done, I prefer use dependencies, and let NSOperationQueues parallelize/squeeze the machine correctly. Some steps are heavy on the "Bionic" (neural networks) some on GPU, some on CPU, and some make bottlenecks on Memory. Processing 5000 full-sized photos, I started with good NSThread code, and by wisely applying dependencies, and let NSOperationQueue do the load-balancing, I went from 10min to 2min on same iPhone, and 10x on Mac. – Motti Shneor Nov 30 '20 at 16:34
1

If you cancel an operation you just hint that it is done, especially in long running tasks you have to implement the logic yourself. If you cancel something the dependencies will consider the task finished and run no problem.

So what you need to do is have some kind of a global synced variable that you set and get in a synced fashion and that should capture your logic. Your running operations should check that variable periodically and at critical points and exit themselves. Please don't use actual global but use some common variable that all processes can access - I presume you will be comfortable in implementing this?

Cancel is not a magic bullet that stop the operation from running, it is merely a hint to the scheduler that allows it to optimise stuff. Cancel you must do yourself.

This is explanation, I can give sample implementation of it but I think you are able to do that on your own looking at the code?

EDIT

If you have a lot of blocks that are dependent and execute sequentially you do not even need an operation queue or you only need a serial (1 operation at a time) queue. If the blocks execute sequentially but are very different then you need to rather work on the logic of NOT adding new blocks once the condition fails.

EDIT 2

Just some idea on how I suggest you tackle this. Of course detail matters but this is also a nice and direct way of doing it. This is sort of pseudo code so don't get lost in the syntax.

// Do it all in a class if possible, not subclass of NSOpQueue
class A

  // Members
  queue

  // job1
  synced state cancel1    // eg triggered by UI
  synced state counter1
  state calc1 that job 1 calculates (and job 2 needs)

  synced state cancel2
  synced state counter2
  state calc2 that job 2 calculated (and job 3 needs)
  ...

start
  start on queue

    schedule job1.1 on (any) queue
       periodically check cancel1 and exit
       update calc1
       when done or exit increase counter1

    schedule job1.2 on (any) queue
       same
    schedule job1.3
       same

  wait on counter1 to reach 0
  check cancel1 and exit early

  // When you get here nothing has been cancelled and
  // all you need for job2 is calculated and ready as
  // state1 in the class.
  // This is why state1 need not be synced as it is
  // (potentially) written by job1 and read by job2
  // so no concurrent access.

    schedule job2.1 on (any) queue

   and so on

This is to me most direct and ready for future development way of doing it. Easy to maintain and understand and so on.

EDIT 3

Reason I like and prefer this is because it keeps all your interdependent logic in one place and it is easy to later add to it or calibrate it if you need finer control.

Reason I prefer this to e.g. subclassing NSOp is that then you spread out this logic into a number of already complex subclasses and also you loose some control. Here you only schedule stuff after you've tested some condition and know that the next batch needs to run. In the alternative you schedule all at once and need additional logic in all subclasses to monitor progress of the task or state of the cancel so it mushrooms quickly.

Subclassing NSOp I'd do if the specific op that run in that subclass needs calibration, but to subclass it to manage the interdependencies adds complexity I recon.

(Probably final) EDIT 4

If you made it this far I am impressed. Now, looking at my proposed piece of (pseudo) code you might see that it is overkill and that you can simplify it considerably. This is because the way it is presented, the different components of the whole task, being task 1, task 2 and so on, appear to be disconnected. If that is the case there are indeed a number of different and simpler ways in which you can do this. In the reference I give a nice way of doing this if all the tasks are the same or very similar or if you have only a single subsubtask (e.g. 1.1) per subtask (e.g. 1) or only a single (sub or subsub) task running at any point in time.

However, for real problems, you will probably end up with much less of a clean and linear flow between these. In other words, after task 2 say you may kick of task 3.1 which is not required by task 4 or 5 but only needed by task 6. Then the cancel and exit early logic already becomes tricky and the reason I do not break this one up into smaller and simpler bits is really because like here the logic can (easily) also span those subtasks and because this class A represents a bigger whole e.g. clean data or take pictures or whatever your big problem is that you try to solve.

Also, if you work on something that is really slow and you need to squeeze out performance, you can do that by figuring out the dependencies between the (sub and subsub) tasks and kick them off asap. This type of calibration is where (real life) problems that took way too long for the UI becomes doable as you can break them up and (non-linearly) piece them together in such a way that you can traverse them in a most efficient way.

I've had a few such a problems and, one in particular I am thinking know became extremely fragile and the logic difficult to follow, but this way I was able to bring the solution time down from an unacceptable more than a minute to just a few seconds and agreeable to the users.

(This time really almost the final) EDIT 5

Also, the way it is presented here, as you make progress in solving the problem, at those junctures between say task 1 and 2 or between 2 and 3, those are the places where you can update your UI with progress and parts of the full solution as it trickles in from all the various (sub and subsub) tasks.

(The end is coming) EDIT 6

If you work on a single core then, except for the interdependencies between tasks, the order in which you schedule all those sub and subsub tasks do not matter since execution is linear. The moment you have multiple cores you need to break the solution up into as small as possible subtasks and schedule the longer running ones asap for performance. The performance squeeze you get can be significant but comes at the cost of increasingly complex flow between all the small little subtasks and in the way in which you handle the cancel logic.

skaak
  • 2,988
  • 1
  • 8
  • 16
  • But this beats the whole point of operation dependencies? I programmed multithreaded code many years, and know very well how to use dispatch_groups for such thing as create dependencies --- however, NSOperationQueue is designed to support this inherently - the docs say it clearly - if an operation is cancelled, dependent operations will not be started. – Motti Shneor Nov 05 '20 at 14:35
  • Also been doing concurrent stuff for years and you always had to cancel yourself. If you add a cancelled op to the queue it is just considered done and the dependencies start immediately. – skaak Nov 05 '20 at 14:36
  • 1
    As far as dependencies are concerned cancel is just the same as finished. – skaak Nov 05 '20 at 14:37
  • many of my operations are independent, and I spread over several different queues (as per priority, quality-of-service and ability to run concurrently). However, when 4 such operations complete (successfully) next stage (another operation) can start. Therefore, I set the next-stage operation to depend on this 4 operations - and dispatch it. Dependencies DO WORK - and it will never run before all 4 have completed. However - docs say that if one of the 4 was cancelled - next stage should not run, and it does. – Motti Shneor Nov 05 '20 at 14:38
  • Really sounds like an interesting problem. You could use latches or semaphores to solve it but that is just a thought. You need to think about scheduling and do that cleverly. It seems you have concurrent operations that are dependent on one another and you want to implement the logic using cancel. That won't work. You need to either schedule the interdependencies much better / cleverer or you need to implement some global conditions to control the flow. – skaak Nov 05 '20 at 14:40
  • Of course with global here I mean properly synced, accessible to all operations stuff such as a type of controller class ... not really global variables ... but I am sure you know that. Again, you rely on cancel for some of your logic e.g. when sub-dependent operations need to kick off. That is a mistake. If you want to cancel you need to implement your own logic or you need to schedule better and NOT schedule if earlier in the chain stuff was cancelled. – skaak Nov 05 '20 at 14:43
  • From the sounds of it you need to wait somewhere for those 4 ops to finish and only then schedule the remaining stuff. That is the nice and clean way. Then you schedule say 6 more and again you wait and when done you schedule the rest. This just an example but you get the point. If you add all your ops from the beginning you need some crazy logic to cater for it all. Cancel is not your friend, it will just remove the cancelled block from the queue and start running the ones you hope would not run. – skaak Nov 05 '20 at 14:45
  • The queue use the op dependencies to decide what to run next. It does not cascade the cancelled state to the dependent ops. This is why it is clean to wait and schedule only when you know it will (need to) run. – skaak Nov 05 '20 at 14:49
  • The first statement is not true. You do not hint that it is done, but that it is un-needed. In my code you don't see the logic, as it is completely synchronous, and complicated, and thus I replaced it with a simple [NSThread sleep] that blocks the executing thread just like real loops and other logic would. – Motti Shneor Nov 05 '20 at 17:36
  • I am NOT asking how to cancel execution of a specific block. My code calls some synchronous APIs - and if they fail - it just cleans up and exits. I am asking about DEPENDENCIES. Since MacOS 10.6 - by the docs - canceled operation should also cancel ALL DEPENDENT other operations, regardless of the queue they're pending in. Cancelled operation - here I demonstrate canceling both BEFORE execution and WITHIN execution - for some reason break this rule. I was wondering whether NSBlockOperation is a crippled implementation, not providing the basics described in the docs for NSOperation? – Motti Shneor Nov 05 '20 at 17:39
  • 1
    So... you were right from the start, and your shortest comment (the one I thumbed-up) should have been posted as the answer :) – Motti Shneor Nov 05 '20 at 18:23
  • @MottiShneor thanks! I am just the messenger here so don't shoot me. You'll know this stuff gets complex quickly but is also nice problem to work on and to solve. You can do it I am sure, but note that you should at least consider some of my other comments as well. At least consider doing the scheduling cleverly otherwise it can mushroom into something that is difficult to control and develop further later. If you put all the ops on the queue from the beginning there are too many things to keep an eye on, you want this tight and close to your chest if that helps. – skaak Nov 05 '20 at 19:20
  • @MottiShneor what follows is very subjective and I'd like to hear how you solve this or your view on it, but I would not subclass NSBlockOperation. In fact, I would not touch it. See it as a service the OS delivers and write your own queuer if that makes sense. This can all be one long block of code, something like schedule stuff, wait for completion, check conditions, schedule stuff, wait for completion etc. You need sync state for all the conditions and some you may need to monitor continously, and this is where the solution lies, but I don't think it is as a NSBlockOp subclass. – skaak Nov 05 '20 at 19:25
  • I recommend that you watch "Advanced NSOperaions" video from WWDC 2015 (Dave Delong) and then download the sample code about higher-level semantics, and subclassing NSOperation. I've done many NSOperation subclasses in the past- some replacing the whole shebang (when I wrote low-level video/audio socket with dynamic network congestion handling) etc. there's nothing wrong with subclassing NSOperation. – Motti Shneor Nov 05 '20 at 19:40
  • Ok thanks. See my suggestion in my edit 2. My bad I thought you were trying to subclass NSOperationQueue ... I got my words mixed up in that previous comment of mine. I would not subclass NSOpQueue! But I would not subclass NSBlockOp anyhow and rather use as in edit 2 to sync the dependencies between concurrent workers. I would subclass NSBlockOp for that special things for that specific task but not necessarily to help with the syncing between different concurrent or dependent workers. – skaak Nov 05 '20 at 19:51