2

I want to work on a file which is composed of 4Kb blocks.

As things happen, I will write more data and map new parts, unmap parts that I do not need anymore.

Is a map() of just 4Kb too small when the total amount of file data to map is around 4Gb total? (i.e. some 1,048,576 individually mapped blocks).

I'm worried that making so many small mmap() calls is not going to be efficient in the end, even if they are very well directed to the exact blocks I want to use. At the same time, it may still be better than reading and writing these blocks with read()/write() each time I change one byte.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156

3 Answers3

2

There is no shortage of address space on 64-bit architectures. Unless your code has to work in 32-bit architectures too (rare these days), map the whole file once and avoid the overhead of multiple mmap calls and thousands of extra kernel objects. With reading and writing changes, it depends on your desired semantics. See this answer.

Anton Tykhyy
  • 19,370
  • 5
  • 54
  • 56
2

On 64-bit systems you should pretty much map the entire file or at least the entire range in one go and let the operating system handle the paging in and out for you. The mmap calls do have some overhead themselves. In practice the user address space on x86-64 is something like 128 TiB so you should be able to map say 1 TiB files/ranges without any problems.

2

As far as I understand it, even a single mmap() that covers several contiguous 4kb pages will require the kernel (and the TLB, MMU...) to deal with as many virtual/physical associations as the number of these pages (this is the purpose of memory pages; contiguous virtual pages can be mapped to non-contiguous physical pages).
So, considering the usage of these mapped pages, once set up by a unique or by many mmap() calls, there should not be any difference in performances.
But each single call to mmap() probably requires some overhead in order to choose the part of virtual address space to use; a single mmap() call will just have to choose once a big enough virtual location (it should not be too difficult on a 64-bit system, as stated in other answers) but repeated calls will imply this overhead many times.

So, if I had to deal with this situation on a 64-bit system, I would mmap() the entire file at once, using huge-pages in order to reduce the pressure on TLB.
Note that mapping the entire file at once does not imply using the same amount of physical memory right at this moment; virtual/physical memory association will only occur for each single page when it is accessed for the first time.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
prog-fh
  • 13,492
  • 1
  • 15
  • 30
  • 1
    Huge pages would cause the system to have to write back megabytes for every byte change? – Antti Haapala -- Слава Україні Oct 06 '19 at 08:35
  • 1
    @AnttiHaapala I don't think so. The synchronisation does not happen after each single byte alteration but is controlled with `msync()` (or at least at `munmap()`). – prog-fh Oct 06 '19 at 08:39
  • I just checked the servers on which the final product would run and they don't have huge pages available anyway. It's 64bit though. Therefore the concept of large memory model is definitely available. – Alexis Wilke Oct 06 '19 at 08:49