6

I was once asked on an interview, how would you deal with messages coming in out of order in a message queue. It has been a while and I have not found a definitive answer and I was wondering if an expert in the field can help me answer it to address my own curiosity.

I understand that some message queues provide exactly-once and FIFO guarantees. Also I am aware of the notion of event time and processing time in streaming systems. For instance, in log based message queues like Kafka, mixed up ordering may be less likely to happen due to the presence of offsets and message durability (I may be wrong). I have also thought about using timestamps requiring each message sender to record the time of message before sending it but this is fraught with inconsistency due to clock skew.

Given all of that, I am wondering how can one address mixed up ordering in a traditional messaging system like AMQP, JMS or RabbitMQ where a dozen of IOT devices may be sending messages and I as a consumer want to reconcile them in the correct order.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Mike
  • 523
  • 5
  • 7

2 Answers2

4

If queue your system is using, provides ordered message guarantee, then simply use that channel(like kakfa's single partition, AMQP under some settings). But if queue your system is using does not provide strict ordering then general Idea is that client can have monotonically increasing[1] number(or timestamp) attached with each message it sends to queue. This forms the basis of sequence which producer intends to send to its receivers.

How to get montonically increasing value:

Using timestamp: POSIX clock_gettime() function with CLOCK_MONOTONIC[2] provides option to get monotonically increasing timestamp, which can be used by producer to put timestamp on each message. Receiver can identify out of order messages when it sees that received message has timestamp older than latest message.

Using sequence number: Before sending each message you can simply increase an atomic counter and attach counter value to each message, so that receiver can know about intended ordering. This will form strictly increasing sequence. Approach is very similar to Lamport's logical clock[3] which provides virtual clock for producer.

Dealing with out of order messages on receiver side: This is pretty much application specific but in general you have 2 options when messages arrive out of order: a) discard the older message, like in cases in which receiver have to show latest value of a stock. b) Have buffer to reorder sequencing, like within a TCP connection(e.g. zookeeper uses TCP as queue for FIFO ordering [4-5])

Tools: If you are not adding timestamp to messages, then send all messages to Apache kafka single partition in sequence from producer, as this will ensure that receiver can receive messages in sequence.

If you are using messaging system which does not guarantee ordered delivery (like AMQP under some settings[6]), then you can consider adding additional monotonically increasing number/clock with each message.

[1] https://en.wiktionary.org/wiki/monotonic_increasing#targetText=Adjective,contrast%20this%20with%20strictly%20increasing

[2] https://linux.die.net/man/2/clock_gettime

[3] https://en.wikipedia.org/wiki/Lamport_timestamps#Lamport's_logical_clock_in_distributed_systems

[4] https://cwiki.apache.org/confluence/download/attachments/24193445/zookeeper-internals.pdf?version=1&modificationDate=1295034038000&api=v2

[5] http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf

[6] RabbitMQ - Message order of delivery

WebServer
  • 1,316
  • 8
  • 12
  • 1
    Thank you for the wonderfully comprehensive answer, this puts an end to the constant dissonance I had in my head. I was aware of logical clocks in distributed systems but never thought about it in this use case! I appreciate it. – Mike Oct 22 '19 at 03:58
2

I can answer with respect to Apache Kafka. Apache Kafka guarantees strict order on a topic by partition means each partition is an immutable sequence of message appending in a strict order. So in case, more than one partition consumer may consume messages from more than one partition which can't be in strict order. We can consider below 2 options to achieve strict order.

  1. If looking for 1 producer message in order use only 1 partition per topic. so the producer will publish on the same partition in sequence order which will get consumed by consumers in strict order.

enter image description here

  1. Producer publishes a message to multi-partition, so use multi-consumer in consumer group but use assign to specific partition per consumer to consume message from specific partition will guarantee strict order per partition per consumer

enter image description here

Nitin
  • 3,533
  • 2
  • 26
  • 36
  • 2
    Thanks for this answer, although it doesn't answer my question completely but it provides a different perspective to the problem in terms of Kafka capabilities. Does Kafka reorder messages in a topic partition if they come in out of order due to network partition or does it simply maintain the original order the messages were received in? – Mike Oct 22 '19 at 04:06