4

I am using camunda community version for one of my workflow project which does kind of orecstartion of microserice flow similar to this, all the features in community versions are enough for my requirement except high availability and auto recovery.

For high availability if I make Database( mySQL) high available as per this guide and two or more instances of spring based camunda manager running behind load balancer would be enough ?

How to recover if camunda accepts the bpnm request and that node failed or crashed after receiving the request ?

in my case each spring based camunda manager gets request and confirm the user with 202(accepted) then camunda will start executing the workflow. So how to auto recover and auto resume that job if node which got the request is crashed?

scoder
  • 2,451
  • 4
  • 30
  • 70

1 Answers1

4

Running multiple instances of the engine (=multiple Spring applications) on top of a highly-available database (make sure it supports read-commited, see https://docs.camunda.org/manual/7.12/introduction/supported-environments/#databases) is definitely sufficient to make Camunda highly available.

In case a node crashes after responding 202 you will fall back to "normal" Java/Spring transaction handling. https://docs.camunda.org/manual/7.12/user-guide/process-engine/transactions-in-processes/#transaction-boundaries should help to clarify this.

So if you make sure that you start your workflow instance, probably with an async start event, commit this transaction and just then return 202, you are safe. The only problem that can arise is that you crash before returning 202, which typically leads to a retry on your REST API, for this case you should make sure you start your workflow idempotently.

rob2universe
  • 7,059
  • 39
  • 54
Bernd Ruecker
  • 970
  • 5
  • 6
  • thanks for the answer, yes my mysql configured for camunda is HA, regarding node failure my case is that it is a single same Spring_Camund service which accepts the requests and put it in DB(my own user table) then starts camunda transaction manager in async task, then it returns the workflow ID(db id) to caller with 202 with workflow ID. So now the node which is running this service it selfs crashes or power failure when only half of the activities are completed in workflow .. hence will that workflow continues with left activity when I restart identical service on other node ? – scoder Feb 07 '20 at 09:46