5

I want to create a consumer that process messages from multiple variable number of sources, that are connected or disconnected dynamically.

What I need is that each consumer prioritize first N messages of each source. Then to run multiple consumers to improve the speed.

I have been reading docs for Work queues, Routing and Topics, and a lot of other docs without identifying how to implement this. Also I made some tests without luck.

Can someone point me how to do it or where to read about it?

--EDIT--

QueueA-----A3--A2--A1-┐

QueueB-----B3--B2--B1-┼------ Consumer

QueueC-----C3--C2--C1-┘

The desired effect is that each consumer gets first messages of each queue. For example: A1, B1, C1, A2, B2, C2, A3, B3, C3, and so on. If a new queue is created (QueueD), the consumer would start receiving messages from it in the same fashion.

Thanks in advance

jamg
  • 91
  • 1
  • 6
  • You're going to have to provide more details as to what it is you are trying to achieve and what your issue is. As-is, every message-based application does what you describe. – theMayer May 02 '18 at 20:20

1 Answers1

2

What I need is that each consumer prioritize first N messages of each source. Then to run multiple consumers to improve the speed.

All message queues that I know of only provide ordering guarantees within the queue itself (Kafka provides ordering guarantee not at queue level but within the partitions within queues). However, here you are asking to serialize multiple queues. Which will not be possible in a distributed system context.

Why? because if you have more than one consumers to these queues, messages will be delivered to each connected consumers of a queue in a round robin fashion.

Assuming a prefetch_count=1 and with two connected consumers, say first set of messages delivered as follows:

  • A1, B1 & C1 delivered to consumer 1 (X)
  • A2, B2 & C2 delivered to consumer 2 (Y)

Now, in a distributed system, everything is async, and things could go wrong. For example:

If X acks A1, A3 will be delivered to X. But if Y acks A2 before X, A3 will be delivered to Y.

Who acks first is not within your control in a distributed system. Consider following scenarios:

  • X might had to wait for I/O or CPU bound task, while Y might got lucky that it doesn't had to wait. Then Y will advance through the messages in queue.
  • Or Y got killed (a partition) or n/w got slow, then X will continue consuming the queue.

I'll strongly advice you to re-think your requirements, and consider your expected guarantees in an async context (you wouldn't be considering a MoM otherwise, would you?).


PS: it is possible to implement what you are asking for with some consumer side logic (with a penalty on performance/throughput).

  • A single consumer has to connect to all queues
  • wait for messages from every queue before Ack'ing the messages.
  • Once a message from every queue is received, group them as a single message and publish to another queue (P).
  • Now many consumers could be subscribed to P to process the ordered group of messages.

I do not advise it, but hey, it is your system, who is going to stop you ;)

Arun Karunagath
  • 1,593
  • 10
  • 24