2

Is there any way other than MaximumSignalsPerExecution to have a workflow reject signals?

Basically I have a workflow that periodically continues as new before it timeouts. But if it's continuously getting a lot of signals before it can continue as new, it will end up timing out and losing some signals. I can set MaximumSignalsPerExecution to be lower so it will reject signals before timing out but ideally I'd like something that could be configured at the workflow level.

I'm testing some worse case scenarios where there is a traffic spike and the workflow would get multiple signals per second.

Sharan Foga
  • 135
  • 8
Aleks
  • 23
  • 4

2 Answers2

3

Firstly, to answer your question: unfortunately there isn’t any other limits that can stop the number of signals being sent. I’m not sure how useful that would be, though. The signal , start and signalwithstart APIs are among the most critical APIs that Cadence users call, because these are the ones that persist data in Cadence. We usually try to keep these APIs as highly available as possible so that we can accept and persist the data that came with these requests. Otherwise, the clients would have to either have a fallback datastore to persist the requests rejected by Cadence, or just propagate failures to their upstream.

The primary benefit of MaximumSignalsPerExecution limit is to protect the Cadence database from unbounded growth of a single workflow execution. I wouldn’t recommend playing with it just to improve one use case.

The race between signals and ContinueAsNew is a relatively common problem. I have some ideas to address it, but I don’t think we’ll be able to do it soon due to competing priorities. In the meantime, I think the best thing to do is to set the right expectations for your workflow. Here are two principles that you should keep in mind when designing your workflows:

  • Even though Cadence scales horizontally very well in terms of number of concurrent workflows, it doesn’t scale very well on a per-workflow basis. The reason is that the cadence server acquires locks for each workflow and does every db update as an expensive transaction in order to provide all the consistency guarantees that are needed behind the scenes to keep the workflows safe. The rule of thumb that I recommend is: design your workflows such that they don’t generate more than 10 events per second, including the signals received. When you exceed that limit, you’ll start hitting lock contentions for a particular workflow, leading to latency increases and timeouts, even though rest of the system is quite healthy/idle.
  • The decision tasks (ie. how long your workflow reacts to events and decides the next steps) should be very fast, as in milliseconds. This would increase the chance of everything around the workflow to move quickly without blocking each other.

In your case, you should look into the following ideas if you haven’t already:

  • Think about how you can reduce the number of signals per workflow if you had a hard limit like let’s say 5 per second. Perhaps you can work around this by starting many similar workflows instead of just one and signaling a random one to distribute the load across them
  • Make sure your decisions are as fast as possible, especially when processing signals and when trying to continue as new.
  • If you are trying to schedule any work (ie. activity, etc) upon receiving the signal, try to just put it into a list and pass it as an input to the next run of the workflow as input, so the new workflow run does the signal handling work that the previous run skipped.
Emrah Seker
  • 297
  • 1
  • 9
0

In addition to Emrah's detailed answer & ideas, I want to add more ideas to address this using Cadence workflow itself. Those idea won't require you to use any other technology(DB/MessageQueue/etc)

  • Depends on what condition you want to use to reject new signals, potentially you can use a search attribute to record some states of the workflow. In workflow, use upsertSearchAttribute API to update the condition. Then before sending any signals, use DescribeWorkflowExecution API to read the search attribute and use it to decide whether or not you should send the signals
    • The above idea is slightly misusing search attributes. Because we are not using the attribute to search anything. Ideally I want to use "memo" to do it but "memo" is not mutable in workflow today. Here is the open issue I want to allow updating memo like updating SA.
    • Because this idea doesn't need to "search" workflow. so even you don't enable AdvancedVisibilty for search attribute, you could still use it this way.
  • Alternatively, you could use a query method instead of an SA. But query is much more heavy than describe a workflow. DescribeWorkflowExecution cost almost nothing on Cadence and is extremely fast. So I would suggest using SA for this case.
  • In the long term, Cadence should support rejecting condition before accepting a signal. There is already some idea to do that. But it's very complicated to design and implement. The name is TBD, some called it "Update API", some called it "SignalWithQuery", but essentially, we want to provide a synchronous API to send query and get the response.
Long Quanzheng
  • 2,076
  • 1
  • 10
  • 22