8

I see many articles suggesting not to map huge files as mmap files so the virtual address space won't be taken solely by the mmap.

How does that change with 64 bit process where the address space dramatically increases? If I need to randomly access a file, is there a reason not to map the whole file at once? (dozens of GBs file)

Saar
  • 1,753
  • 6
  • 20
  • 32

3 Answers3

8

On 64bit, go ahead and map the file.

One thing to consider, based on Linux experience: if the access is truly random and the file is much bigger than you can expect to cache in RAM (so the chances of hitting a page again are slim) then it can be worth specifying MADV_RANDOM to madvise to stop the accumulation of hit file pages steadily and pointlessly swapping other actually useful stuff out. No idea what the windows equivalent API is though.

Community
  • 1
  • 1
timday
  • 24,582
  • 12
  • 83
  • 135
5

There's a reason to think carefully of using memory-mapped files, even on 64-bit platform (where virtual address space size is not an issue). It's related to the (potential) error handling.

When reading the file "conventionally" - any I/O error is reported by the appropriate function return value. The rest of error handling is up to you.

OTOH if the error arises during the implicit I/O (resulting from the page fault and attempt to load the needed file portion into the appropriate memory page) - the error handling mechanism depends on the OS.

In Windows the error handling is performed via SEH - so-called "structured exception handling". The exception propagates to the user mode (application's code) where you have a chance to handle it properly. The proper handling requires you to compile with the appropriate exception handling settings in the compiler (to guarantee the invocation of the destructors, if applicable).

I don't know how the error handling is performed in unix/linux though.

P.S. I don't say don't use memory-mapped files. I say do this carefully

valdo
  • 12,632
  • 2
  • 37
  • 67
  • 3
    @David Heffernan: not exactly, this depends on what are you reading exactly. If there's an error to load either program code or data (global, stack/tls or heap) - the process is just terminated. OS does not give the application an opportunity to handle this, because the application is already "damaged". OTOH errors that arise from the memory-mapped file that the application created on its own behalf - has much more chances to handle properly – valdo Mar 07 '12 at 21:43
  • 1
    So you are saying that errors with memory mapped files are different from, say, reading a dud pointer? In any case I can't see the relevance of your answer to the question. Even if it is sound advice, it is orthogonal to the question asked. – David Heffernan Mar 07 '12 at 21:46
  • 2
    @David Heffernan: sure. OS doesn't know that you've "reading a dud pointer". From its perspective you attempt to dereference an inaccessible virtual address, it raises an exception, and your application has a chance to handle it. Was it a bug, or a legitimate condition - it's up to the application. I agree that it's orthogonal to the question "map the whole file at once or by pieces". I thought the question was mapping vs other alternatives – valdo Mar 07 '12 at 22:25
  • 1
    Question is mapping entire file vs mapping small blocks – David Heffernan Mar 07 '12 at 22:26
2

One thing to be aware of is that memory mapping requires big contiguous chunks of (virtual) memory when the mapping is created; on a 32-bit system this particularly sucks because on a loaded system, getting long runs of contiguous ram is unlikely and the mapping will fail. On a 64-bit system this is much easier as the upper bound of 64-bit is... huge.

If you are running code in controlled environments (e.g. 64-bit server environments you are building yourself and know to run this code just fine) go ahead and map the entire file and just deal with it.

If you are trying to write general purpose code that will be in software that could run on any number of types of configurations, you'll want to stick to a smaller chunked mapping strategy. For example, mapping large files to collections of 1GB chunks and having an abstraction layer that takes operations like read(offset) and converts them to the offset in the right chunk before performing the op.

Hope that helps.

Riyad Kalla
  • 10,604
  • 7
  • 53
  • 56