I'd like to monitor in Prometheus number of cadence workflows currently running.
I checked metrics which are exported by different cadence services (like cadence_history
, cadence_worker
, cadence_frontend
and so on) and the only workflows-related metrics I could find were:
activity_end_to_end_latency histogram
(workflowType
is one of the labels)workflow_success counter
/workflow_terminate counter
/workflow_failed counter
So it seems that there are metrics to analyze already completed workflows, but no information about current ones. Am I right or I missed something?
It means that I have to export needed metrics on my own and I see 2 possible solutions:
- create a gauge and increment/decrement it when on start and stop of my workflow, for example:
func MyWorkflow(ctx workflow.Context) error {
mymetrics.gauge.Inc()
if err := workflow.ExecuteActivity(ctx, someActivity).Get(ctx, nil); err != nil {
mymetrics.gauge.Dec()
return err
}
// ...
mymetrics.gauge.Dec()
return nil
}
The disadvantage of this approach is that workflows terminated manually by the user will not be measured correctly.
- create a prometheus exporter and use
cadence.client.ListOpenWorkflow
function to collect number of running workflows. However, the cadence docs says that "heavy usage of this API may cause huge persistence pressure", so I suppose that's a very bad idea to call it inside a prometheus exporter.
Do you see any other possible solutions?