Why can't fewer than 3m+1 generals cope with m traitors?

Question

Lately I am reading the paper The Byzantine Generals Problem and confused by the conclusion that "there is no solution with fewer than 3m+1 generals can cope with m traitors".

It shows in the paper that in the case of 3 generals there is no solution for oral messages which I can understand. But when it comes to 3m generals the paper proves by contradiction: Firstly using each Byzantine general to simulate at most m Albanian generals and then shows that if there is a solution for these 3m generals with m traitors then the solution for 3 Byzantine generals exists which has been proved not possible.

I cannot get that. Does it mean that when three groups each with the same number of generals (like m generals) act the same we can consider the m generals in a group as a single one general?

score 1 · Answer 1 · answered Aug 25 '18 at 13:02

I copied my answer from this question. The post shows that why 3f+1 replicas(3m+1 generals) are needed to tolerate f number Byzantine faulty replicas (m traitors) by illustrating a specific voting scenario. In Byzantine general terms, for example, 0/1 values might represent attack/retreat.

I think I need to elaborate the answer with illustrating coordinated attack of malicious replicas including primary. Let's say n replicas where n = 3f + 1 = 100, f = 33 in Byzantine fault tolerant system. In the system, the system can tolerate f number of Byzantine faulty replica. Now I give an counter-example to answer your question. Consider the following setting; I partitioned n replicas into three group;

G1 = {b1, b2, ... , b33} for Byzantine faulty replicas including Byzantine primary(b1), |G1| = 33 G2 = {r1, r2, ... , r33} for correct replica group, |G2| = 33 G3 = {r34, r35, ... , r67} for correct replica group, |G3| = 34 Because n = |G1| + |G2| + |G3| = 33 + 33 + 34 = 100, above partition makes sense. And G1 is entirely controlled in a coordinated way by super-genius hacker who are especially interested in destroy the protocol.

Now, I will demonstrate how above setting violates safety condition if the commit-phase disappears from the protocol; (The safety condition means that the state of G2 and G3 should be the same). For simple description, the consensus value is simplified as a binary value, not the request with sequence number.

[Pre-Prepare phase]: Primary(b1) sends a 0 value to G2 and 1 value to G3. This situation is not a problem cause we assume Byzantine primary. [Prepare phase]: Now replicas in G2 and G3 exchange the message from the primary to check if they both have the same message. But, In this phase, replicas from G1 sends a 0 value to G2 and sends a 1 value to G3. After message exchange, the situation is as follows

a. Replicas in G2 received following results; 66 votes for 0 value, 34 votes for 1 value.

b. Replicas in G3 received following results; 33 votes for 0 value, 33+34=67 votes for 1 value.

Because quorum size is 2f+1 = 67, replicas in G3 accepts the proposed value from Byzantine primary who coordinates with Byzantine replicas while replicas in G2 doesn't.

So in the system, even though the system can tolerate up to 33 Byzantine faulty replicas including primary, it immediately fails in your assumption.

Why can't fewer than 3m+1 generals cope with m traitors?

1 Answers1