11

I have a data structure that I'd like to rework to page out on-demand. mmap seems like an easy way to run some initial experiments. However, I want to limit the amount of buffer cache that the mmap uses. The machine has enough memory to page the entire data structure into cache, but for test reasons (and some production reasons too) I don't want to allow it to do that.

Is there a way to limit the amount of buffer cache used by mmap?

Alternatively, an mmap alternative that can achieve something similar and still limit memory usage would work too.

JaredC
  • 5,150
  • 1
  • 20
  • 45
  • 2
    buffer cache is excellently maintained automatically by the kernel. By itself it won't cause out of memory errors. Why do you want to control it yourself? – fukanchik Jul 31 '17 at 17:52
  • @fukanchik Because of my environment, I must know exactly how much memory my process will use, and limit it to that. Additionally, I have a machine that has 100GB of memory, but I'd like to test the software as if the machine only has 1GB of memory. – JaredC Jul 31 '17 at 18:22
  • I think your best bet is not to handle this from within the application, but at the OS level instead. Here's a good starting point: https://unix.stackexchange.com/questions/44985/limit-memory-usage-for-a-single-linux-process –  Jul 31 '17 at 19:04
  • In a very narrow use case, I'd completely agree. Unfortunately my data structure is just a small player in a much larger and much more complex system. My contract requires me to limit all memory usage and guarantee it, no matter what else the process is doing. – JaredC Jul 31 '17 at 20:37
  • Okay. What about the second arg `length`? Why it does not suit your needs? – fukanchik Jul 31 '17 at 21:35
  • @fukanchik the `length` argument controls the size of the virtual mapping, not the amount of physical memory that gets used in caching the mmap contents. – JaredC Aug 01 '17 at 14:26
  • 1
    @JaredC I don't think it's possible to limit physical memory usage by process on most modern operating systems. The system is free to throw RAM at them if it feels that's best, and there's not a whole lot you can do to stop it. You may have no choice but to run in a VM or container of some sort. Even then, you can't control the RAM the system assigns to your VM's manager. – David Schwartz Aug 01 '17 at 23:20

4 Answers4

5

From my understanding, it is not possible. Memory mapping is controlled by the operating system. The kernel will make the decisions how to use the available memory in the best way, but it looks at the system in total. I'm not aware that quotas for caches on a process level are supported (at least, I have not seen such APIs in Linux or BSD).

There is madvise to give the kernel hints, but it does not support to limit the cache used for one process. You can give it hints like MADV_DONTNEED, which will reduce the pressure on the cache of other applications, but I would expect that it will do more harm than good, as it will most likely make caching less efficient, which will lead to more IO load on the system in total.

I see only two alternatives. One is trying to solve the problem at the OS level, and the other is to solve it at the application level.

At the OS level, I see two options:

  1. You could run a virtual machine, but most likely this is not what you want. I would also expect that it will not improve the overall system performance. Still, it would be at least a way to define upper limits on the memory consumption.
  2. Docker is the another idea that comes to mind, also operating at the OS level, but to the best of my knowledge, it does not support defining cache quotas. I don't think it will work.

That leaves only one option, which is to look at the application level. Instead of using memory mapped files, you could use explicit file system operations. If you need to have full control over the buffer, I think it is the only practical option. It is more work than memory mapping, and it is also not guaranteed to perform better.

If you want to stay with memory mapping, you could also map only parts of the file in memory and unmap other parts when you exceed your memory quota. It also has the same problem as the explicit file IO operations (more implementation work and non-trivial tuning to find a good caching strategy).

Having said that, you could question the requirement to limit the cache memory usage. I would expect that the kernel does a pretty good job at allocating memory resources in a good way. At least, it will likely be better than the solutions that I have sketched. (Explicit file IO, plus an internal cache, might be fast, but it is not trivial to implement and tune. Here, is a comparison of the trade-offs: mmap() vs. reading blocks.)

During testing, you could run the application with ionice -c 3 and nice -n 20 to somewhat reduce the impact on the other productive applications. There is also a tool called nocache. I never used it but when reading through its documentation, it seems somewhat related to your question.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
1

It might be possible to accomplish this through the use of mmap() and Linux Control Groups (more generally, here or here). Once installed, you have the ability to create arbitrary limits on the amount of, among other things, physical memory used by a process. As an example, here we limit the physical memory to 128 megs and swap memory to 256 megs:

cgcreate -g memory:/limitMemory
echo $(( 128 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.limit_in_bytes
echo $(( 256 * 1024 * 1024 )) > /sys/fs/cgroup/memory/limitMemory/memory.memsw.limit_in_bytes
David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
0

I would go the route of only map parts of the file at a time so you can retain full control on exactly how much memory is used.

Chuck Norrris
  • 284
  • 5
  • 12
0

you may use ipc shared memory segment, you will be the master of your memory segments.

sancelot
  • 1,905
  • 12
  • 31