Scheduling memory-bound tasks in java

Question

Suppose I have a large batch of memory-bound tasks that are quite independent of one another. To make things concrete, let's say I can allocate 30GB for the heap and that each task requires on average about 3GB of memory at its peak, but with some variability both over time and from task to task. A few tasks here and there might even require 6GB.

In this case, it seems more efficient to try to run 10 (or arguably even more) tasks concurrently, and if / when we bump into the memory limit have the task wait, much the same as we do with other shared resources like I/O, specific memory addresses (which are accessed through locks), etc.

Is it possible do this in Java? More generally What's the best way to handle memory-bound task scheduling in Java?

Some Related Questions and "Close Misses"

This question asks whether it's possible to have threads in java wait for memory instead of throwing an OOM exception, but the answers seem to focus on why this is a bad idea to begin with - perhaps because the question suggests the number of threads is unreasonable. Also, I guess treating all memory requests as equal can lead to deadlocks. So I want to emphasize that here we are talking about only about 10 tasks, and the desire to "max out" the memory usage seems like a very natural one. I do not mind wrapping my tasks by some suitable logic that will distinguish their memory requests as having lower priority. I can even accept a solution where I need to identify the class whose instances are filling up the memory and maybe add some suitable counter - but I'd prefer a platform-independent solution that works "out of the box", if there is one.

This question also also asks about scheduling memory-bound tasks but seems to presuppose a specific solution framework.

The best way is to use direct access to off-heap memory (see, for example, https://github.com/anatolygudkov/workshops/tree/master/offheap-memory) and make sure you address the memory in sequential manner (cache-friendly) if possible. Also, make sure you share between threads as less memory as possible and the sharing is correct in terms of JMM. — AnatolyG, Aug 11 '21 at 20:13

pveentjer · Answer 1 · 2021-08-15T08:16:47.860

The problem is that within a single JVM you have very little control on how much memory a single thread is going to use; unless you make use of offheap (e.g. using Unsafe or direct memory as AnatolyG already mentioned). If you have huge array allocations, you could also control these. But we need to know more about the data-structures that consume the most memory.

But if you have orbitrary object graphs you don't have much control over, perhaps it smarter to model the problem using multiple processes. You have 1 intake controller process and then a bunch of worker processes. And on each process you can configure the maximum amount of heap a JVM is allowed to use.

Bumping into memory limits on OS level can be a huge PITA because it could lead to swapping and this will makes all the threads in a system slow. Or even worse, OOM-killer. Make sure you set the vm.swappiness to a very low value to prevent premature swapping.

Do you know up front how much memory a process is going to consume? If so, then you could keep track of the maximum amount of memory being consumed in the system and don't allow for new tasks in the system before tasks have completed.

If you don't know up front the memory limits, then you could assume each tasks will use the maximum, but this can lead to under-utilization of memory.

Scheduling memory-bound tasks in java

Some Related Questions and "Close Misses"

1 Answers1