I copied my answer from this question. The post shows that why 3f+1 replicas(3m+1 generals) are needed to tolerate f number Byzantine faulty replicas (m traitors) by illustrating a specific voting scenario. In Byzantine general terms, for example, 0/1 values might represent attack/retreat.
I think I need to elaborate the answer with illustrating coordinated attack of malicious replicas including primary. Let's say n replicas where n = 3f + 1 = 100, f = 33 in Byzantine fault tolerant system. In the system, the system can tolerate f number of Byzantine faulty replica. Now I give an counter-example to answer your question. Consider the following setting; I partitioned n replicas into three group;
G1 = {b1, b2, ... , b33} for Byzantine faulty replicas including Byzantine primary(b1), |G1| = 33
G2 = {r1, r2, ... , r33} for correct replica group, |G2| = 33
G3 = {r34, r35, ... , r67} for correct replica group, |G3| = 34
Because n = |G1| + |G2| + |G3| = 33 + 33 + 34 = 100, above partition makes sense. And G1 is entirely controlled in a coordinated way by super-genius hacker who are especially interested in destroy the protocol.
Now, I will demonstrate how above setting violates safety condition if the commit-phase disappears from the protocol; (The safety condition means that the state of G2 and G3 should be the same). For simple description, the consensus value is simplified as a binary value, not the request with sequence number.
[Pre-Prepare phase]: Primary(b1) sends a 0 value to G2 and 1 value to G3. This situation is not a problem cause we assume Byzantine primary.
[Prepare phase]: Now replicas in G2 and G3 exchange the message from the primary to check if they both have the same message. But, In this phase, replicas from G1 sends a 0 value to G2 and sends a 1 value to G3. After message exchange, the situation is as follows
a. Replicas in G2 received following results; 66 votes for 0 value, 34 votes for 1 value.
b. Replicas in G3 received following results; 33 votes for 0 value, 33+34=67 votes for 1 value.
Because quorum size is 2f+1 = 67, replicas in G3 accepts the proposed value from Byzantine primary who coordinates with Byzantine replicas while replicas in G2 doesn't.
So in the system, even though the system can tolerate up to 33 Byzantine faulty replicas including primary, it immediately fails in your assumption.