6

why mmap is better than read and write

one more similar post

My question is as follows: There are certain scenarios people using mmap rather than to read from files. One such code is:

 *mapping = mmap(NULL, *mapping_size, PROT_READ | PROT_WRITE,
      MAP_POPULATE | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

The above code tries to allocate huge amount of memory. I want to know what is mmap does in this case, how it works. Everyone talks about advantage of mmap wrt files. But these kinds of code where fd is set to -1 are frequent. What does it mean, what are the advantages of doing so.? I wish someone can clear my doubt, which I couldn't ask completely due to ambiguity.

Thank you

Community
  • 1
  • 1
ANTHONY
  • 333
  • 5
  • 18
  • 2
    From the [man](http://man7.org/linux/man-pages/man2/mmap.2.html): _some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this._ – LPs May 09 '16 at 09:55
  • Thank you for reply. Man says "The mapping is not backed by any file; its contents are initialized to zero. The fd and offset arguments are ignored; however, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this. The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on Linux only since kernel 2.4." I would like to know any practical cases where we use ANONYMOUS!!? – ANTHONY May 09 '16 at 12:53
  • In very _zipped_ words: Each time you don't want to connect your memory to a file on disk. – LPs May 09 '16 at 13:03
  • ya, true. But I want to know the case where fd=-1 , there is no file involved! – ANTHONY May 09 '16 at 13:06
  • It is required when the platform specific manual says it hat to be -1. – LPs May 09 '16 at 13:12
  • BTW it is clear that if you want an anonymous mmap you have not to pass a FD an that parameter should be, accordingly to platform specifics, set to the null file descriptor value. – LPs May 09 '16 at 13:15
  • For more explanation about what mmap() does, see my answer here: http://stackoverflow.com/a/8507066/905902 – wildplasser Oct 06 '16 at 20:19

2 Answers2

9

Let me try to address part of this question, specifically:

But these kinds of code where fd is set to -1 are frequent. What does it mean, what are the advantages of doing so.?

mmap() is used for creating a memory mapping somewhere in virtual memory (somewhere which can be referenced to by the process issuing mmap). Specifying a file descriptor allows the memory to be swapped out to disk. Also, as only the region of the file currently accessed has to be loaded to memory, one can mmap files of size consistently larger than physical memory and disk (swap) space. See the GNU documentation.

There are several use cases where one would want to not specify a file descriptor and map an anonymous region of memory. One of them could be to extend a process' heap. Another would be the will to share data without having them persisted in a file, and thus not incur extra I/O overhead. From the GNU doc again:

MAP_ANONYMOUS

MAP_ANON

This flag tells the system to create an anonymous mapping, not connected to a file. filedes and offset are ignored, and the region is initialized with zeros.

Anonymous maps are used as the basic primitive to extend the heap on some systems. They are also useful to share data between multiple tasks without creating a file.

On some systems using private anonymous mmaps is more efficient than using malloc for large blocks. This is not an issue with the GNU C Library, as the included malloc automatically uses mmap where appropriate.

However, note that anonymous mmap-ed memory can only be accessed from within the process, or by its child(ren). Since the memory is anonymous, there is no way to reference it! One would have to use shm_open() to wrap the shared memory in an object and make it available to other processes. See that excerpt from the shm_open() man page (bolded part is mine):

shm_open() creates and opens a new, or opens an existing, POSIX shared memory object. A POSIX shared memory object is in effect a handle which can be used by unrelated processes to mmap(2) the same region of shared memory

fd = -1 is just compliance for some systems to accept your allocation and disregard the file descriptor. See that expert from man mmap on Linux:

MAP_ANONYMOUS

The mapping is not backed by any file; its contents are initialized to zero. The fd and offset argument are ignored; however, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this. The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on Linux only since kernel 2.4.

One of the question you mention has some reference about this system specific behavior.

Dharmit
  • 5,498
  • 2
  • 27
  • 30
Bacon
  • 1,814
  • 3
  • 21
  • 36
0

It is one method used to map dynamic (new) memory into your application. For a libc implementing malloc() (and friends), this is one possible technique for actually allocating the memory

Stian Skjelstad
  • 2,277
  • 1
  • 9
  • 19
  • Thanks for the reply. But malloc will be enough for allocating memory right. Why one has to use mmap along with ANON to allocate memory (with fd=-1).?? There is other purpose of mmap rather than operating on files. It was to dynamically allocated memory. But why not malloc , why mmap again, pros and cons and any other insights about mmap for this usage will be greatly appreciated – ANTHONY May 09 '16 at 12:56
  • malloc() will probably suit you well. Special projects that wants to make their own memory allocation implementation can use this like electric fence, valgrind, UML. – Stian Skjelstad May 09 '16 at 13:01
  • yes. But even those implementations can be done through malloc right?? What special advantage or flexibility this mmap is giving ?? – ANTHONY May 09 '16 at 13:08
  • Probably nothing, because of `malloc`, for a huge amount of memory, will use `anonymous mmap` to allocate the mem. – LPs May 09 '16 at 13:11