2

TL;DR - BFT cluster with 4-5 notary nodes grinds to a halt when one replica is killed.

I ran the notary demo and the Raft cluster (with 3 notary nodes) behaved as expected - when I kill the leader, there's an election and the notary cluster continues to provide a reliable service.

I expect the same thing to happen when I run a BFT cluster (with 4 notary nodes) - killing one of the replicas should not stop the cluster from providing a reliable notary service. However, here is what happens:

1) Start the BFT notary cluster

2) I can notarise 10 transactions using gradlew samples:notary-demo:notarise

3) Stop one of the replicas in the cluster

4) Try to notarise 10 transactions using gradlew samples:notary-demo:notarise

5) Wait for a few minutes, nothing happens (transactions not notarised)

6) All of the remaining replicas terminals keep filling with re-connecting to replica 1 at /127.0.0.1:11010

Just to be on the safe side, I decided to add another notary node to the cluster. However, nothing changes - there are 5 notary nodes and killing one of them makes the cluster grind to a halt.

I looked into how BFT SMaRt works, but as far as I can tell, it should be able to tolerate any failures (including crash-stop) as long as there are enough working replicas (N >= 3f + 1).

Is there something I'm missing here? Is the behaviour that I'm expecting unreasonable - BFT cluster with 4-5 notary nodes being able to tolerate 1 node dying? Or is that an issue with Corda?

qlfu_qlfu
  • 21
  • 2

1 Answers1

0

It's hard to know what the issue was in this case as there's not much information here, however the corda repo has updated this sample recently so it may be worth trying to revisit the project to see if it works correctly now.

Here's a link to the recent 4.5 release notary demo:

https://github.com/corda/corda/tree/release/os/4.5/samples/notary-demo

davidawad
  • 1,023
  • 11
  • 20