5

I have a set of files whose lengths are all multiples of the page-size of my operating system (FreeBSD 10). I would like to mmap() these files to consecutive pages of RAM, giving me the ability to treat a collection of files as one large array of data.

Preferably using portable functions, how can I find a sufficiently large region of unmapped address space so I can be sure that a series of mmap() calls to this region is going to be successful?

fuz
  • 88,405
  • 25
  • 200
  • 352
  • May I know what you're tryna do? – cadaniluk Jan 01 '16 at 21:38
  • @cad See first paragraph. Basically, I have a dataset which is split into multiple files and I want to map it into a continuous memory region to treat it as one. – fuz Jan 01 '16 at 21:39
  • Can you `mmap()` the first file letting the o/s choose the address for you, and then try to map the other files contiguously with that? I'd expect that to work reasonably well — but I've not tested it on any system, least of all FreeBSD 10. – Jonathan Leffler Jan 01 '16 at 21:41
  • @JonathanLeffler The data set is some 500 GB in size and each chunk is 50 MB. It's very likely that the OS fits the first mapping somewhere in the low address range without 500 GB free range above it. – fuz Jan 01 '16 at 21:43
  • 1
    It would be sensible to include such size information in the question. To be even contemplating 500 GiB in memory, you must be on a large 64-bit machine. That means there are large (even larger than 500 GiB) gaps in the memory map — the 64-bit address space is a million times bigger than that (with some space left over). You could probably argue that you could choose almost any well aligned address and probably get away with it. You might need to look at where your shared libraries, stack, heap are, just to make sure you stay clear of those. The 'try asking' approach in the answer is similar. – Jonathan Leffler Jan 01 '16 at 21:47

2 Answers2

6

Follow these steps:

  1. First compute the total size needed by enumerating your files and summing their sizes.
  2. Map a single area of anonymous memory of this size with mmap. If this fails, you lose.
  3. Save the pointer and unmap the area (actually, unmap may not be necessary if your system's mmap with a fixed address implicitly unmaps any previous overlapping region).
  4. Map the first file at this address with the appropriate MAP_FIXED flag.
  5. Increment the address by the file size.
  6. loop to step 4 until all files have been mmapped.

This should be fully portable to any POSIX system, but some OSes might have quirks that prevent this method. Try it.

fuz
  • 88,405
  • 25
  • 200
  • 352
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    Great idea! To get the initial probe-mapping, I can create a sparse-file of the desired length and map that as my operating system won't let me map more anonymous memory than I have RAM (as far as I'm concerned). – fuz Jan 01 '16 at 21:40
  • 2
    Oh yeah, you don't even need to unmap the area—`mmap` will happily map right over it with `MAP_FIXED` as far as I'm concerned. – fuz Jan 01 '16 at 21:54
  • 1
    This also has the side-effect of avoiding the situation where another thread races with you for the address range you just cleared for use. – fuz Jan 01 '16 at 22:18
  • @FUZxxl - I missed that. Comment deleted. – Andrew Henle Jan 01 '16 at 22:23
2

You could mmap a large region where the size is the sum of the sizes of all files, using MAP_PRIVATE | MAP_ANON, and protection PROT_NONE which would prevent the OS from unnecessarily committing the memory charges.

This will reserve but not commit memory.

You could then open file filename1 at [baseAddr, size1) and open filename2 at [baseAddr + size1, baseAddr + size1 + size2), and so on.

I believe the flags for this are MAP_FIXED | MAP_PRIVATE.

Aaditya Kalsi
  • 1,039
  • 8
  • 16