Java: what is the best approach for high performance of multi-threading in a time-critical application?

Question

I’m developing a network proxy application using Java 8. For ingress, the main logic is the data-processing-loop: getting a packet in the inbound queue, processing the content data (e.g. protocol-adoption), and put it in the send-queue. Multi virtual TCP channels are allowed in the design, so a data processing thread, among a list of data-processing threads, handles a bunch of channels at a specific time duration, as a part of the whole job (e.g., for the channels with channel.channelId%NUM_DATA_PROCESSING_THREADS = 0, which is determined by a load-balancing scheduler). Channels are stored in an array and accessed by using the channeled as the index of the cell, which is wrapped by a class that provides methods like register, deregister, getById, size, etc., and the instance is called CHANNEL_STORE in the program. I need to use these methods in the main logic (data-processing-loop) by different threads (at least dispatcher thread, data processing thread, and the control operation thread for destroying a channel from the GUI). Then I need to consider concurrency among these threads. I have several candidate-approaches:

Use synchronized or reentrant locks surrounding the register, deregister, getById, etc. This is the simplest and its thread-safe. But I have performance concerns about the lock (CAS) mechanisms since I need to perform the operations on the CHANNEL_STORE (especially getById) at a very high frequency.
Designate the operations of CHANNEL_STORE to a SingleThreadExecutor by executor.execute(runnable) and/or executor.submit(callable). The concern is the performance of creating runnable/callables at each such destination in the data-processing-loop: creating the runnable instance and call execute – I have no idea will this be even more expansive than the synchronized or reentrant locks. In the reality (so far) there is post-operation so only putting runnable and no need to wait for the callable return in the data-processing-loop, although post-operation is needed in the control loop.
Designate the operations of CHANNEL_STORE to a dedicated task by a pair of ArrayBlockingQueue instead of Executor For each access to CHANNEL_STORE, put a task-indicator together with an attachment of parameters to the first queue, and then the dedicated thread loops on this queue by the blocking method take and operates on the CHANNEL_STORE. Then, it put the result to the 2nd queue for the Designator to continue the post-operation (currently no need, however). I regard this as the fastest, assuming the blocking queue in JVM is lock-free. The concern on this is that code is very messy and error-prone.

I think the 2nd and 3rd may be called "serialization".

The reason that I cannot simply assign tasks to a thread-pool for data processing and forget them is that the TCP stream data packets of each channel cannot be disordered, it has to be in serial per channel base.

Questions:

what’s the performance of the second way comparing to the first way?
what’s the suggestion for my situation?

I'm currently using stream-IO for LAN read/write. If using NIO, the coordination between the NIO thread and data processing threads may bring additional complexity (e.g post operations). So I think this question is meaningful for time-critical (stream-based, multi-channel network) applications like mine.

score 1 · Accepted Answer · answered Oct 17 '20 at 22:27

1

If I understand well your use case, this is a common problem in concurrent programming. One solution is to use the ring buffer approach, which usually offers a good solution to both synchronization and too many objects creation problems.

You can find a good implementation of this in the lmax dispruptor library. See https://lmax-exchange.github.io/disruptor/ to know more about this. But keep in mind that it is not magic and must be adapted to your use case.

answered Oct 17 '20 at 22:27

Alexandre Cartapanis

1,513
3
15
19

Thank you for the recommendation. The LMAX disrupter seems to fit my scenario and I believe it may save me a lot of struggling time. However, the documentation is not easy to understand and I'm still trying to get the whole thing before adapting it to my program. will accept the answer once I make it done. – zipper Oct 18 '20 at 09:57
2

Martin Fowler has written a good post about the disruptor, with some explanations and recommendations (https://martinfowler.com/articles/lmax.html). You can also find a more practical introduction on https://www.baeldung.com/lmax-disruptor-concurrency. – Alexandre Cartapanis Oct 18 '20 at 10:45

Java: what is the best approach for high performance of multi-threading in a time-critical application?

1 Answers1