Firstly, to answer your question: unfortunately there isn’t any other limits that can stop the number of signals being sent. I’m not sure how useful that would be, though. The signal
, start
and signalwithstart
APIs are among the most critical APIs that Cadence users call, because these are the ones that persist data in Cadence. We usually try to keep these APIs as highly available as possible so that we can accept and persist the data that came with these requests. Otherwise, the clients would have to either have a fallback datastore to persist the requests rejected by Cadence, or just propagate failures to their upstream.
The primary benefit of MaximumSignalsPerExecution
limit is to protect the Cadence database from unbounded growth of a single workflow execution. I wouldn’t recommend playing with it just to improve one use case.
The race between signals and ContinueAsNew
is a relatively common problem. I have some ideas to address it, but I don’t think we’ll be able to do it soon due to competing priorities. In the meantime, I think the best thing to do is to set the right expectations for your workflow. Here are two principles that you should keep in mind when designing your workflows:
- Even though Cadence scales horizontally very well in terms of number of concurrent workflows, it doesn’t scale very well on a per-workflow basis. The reason is that the cadence server acquires locks for each workflow and does every db update as an expensive transaction in order to provide all the consistency guarantees that are needed behind the scenes to keep the workflows safe. The rule of thumb that I recommend is: design your workflows such that they don’t generate more than 10 events per second, including the signals received. When you exceed that limit, you’ll start hitting lock contentions for a particular workflow, leading to latency increases and timeouts, even though rest of the system is quite healthy/idle.
- The
decision tasks
(ie. how long your workflow reacts to events and decides the next steps) should be very fast, as in milliseconds. This would increase the chance of everything around the workflow to move quickly without blocking each other.
In your case, you should look into the following ideas if you haven’t already:
- Think about how you can reduce the number of signals per workflow if you had a hard limit like let’s say 5 per second. Perhaps you can work around this by starting many similar workflows instead of just one and signaling a random one to distribute the load across them
- Make sure your decisions are as fast as possible, especially when processing signals and when trying to continue as new.
- If you are trying to schedule any work (ie. activity, etc) upon receiving the signal, try to just put it into a list and pass it as an input to the next run of the workflow as input, so the new workflow run does the signal handling work that the previous run skipped.