I'm doing system design interview prep and was practicing with a "designing WhatsApp" question. One of the things I'm still unclear on is how a system like WhatsApp maintains order in message delivery. I'll give my high-level understanding:
Message delivery needs to be causal but doesn't need to be linearizable. That is, concurrent messages (where concurrency is defined as messages between "typing" and "ack'd by server") are incomparable and can be delivered in any order.
Ordering/causality can apply to intra-group messages (the most prominent case) or inter-group messages (e.g., if two users are part of two different shared chats with other users).
Most designs I've seen either don't explicitly address ordered message delivery or address it with a per-user message queue (e.g., https://www.youtube.com/watch?v=WzBzYX1aSrU&t=0s). This per-user queue could be either be an on-server queue (e.g., RabbitMQ) or something like a FIFO SQS queue with user_id as the message group id (though the SQS approach might not scale given FIFO SQS constraints). However, it's not clear to me how a per-user queue completely solves the message ordering question. A per-user queue ensures ordered delivery of messages but does nothing to ensure that messages are placed on a given user's queue in the correct order.
There are two main ways I can think of to avoid this. One is to assign message workers on a per-group basis. Then we have an architecture where for each user, there is a single gateway assigned to handle that user/client's messages (send and receive - Gaurav Sen discusses this approach in https://www.youtube.com/watch?v=vvhC64hQZMk). For each group, there is a single message processor assigned to handle that group's messages. In this way, the message processor will push a given message to each user's gateway queue (or to temp storage if that user is offline) before it processes another message.
The other way to address this would be a total order broadcast (though perhaps some sort of total order multicast could be used). However, I don't necessarily see how this would scale, as it's not clear that multicast would work (and broadcast doesn't seem like it would scale).
TLDR; when designing a messaging app, it seems that ordered message delivery defines a partial order that requires both a per-user queue (for delivering messages in order) and a per-group queue (for putting messages on each recipient's queue in the same order).