why we can mmap to a file but exceed the file size?

Question

For example.

fd = ::open ("/test.txt", O_RDONLY, 0);
struct stat buf;
fstat(fd, &buf);
char* addr = (char*)::mmap(NULL, buf.st_size + 10, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fd, 0);

Notice that I mapped + 10 here. But it still works?

Why system does NOT apply any check? Is it dangerous?

Thanks

@coderredoc mmap() can be called equally well from both c and c++. — Jeremy Friesner, Dec 02 '17 at 05:10
@JeremyFriesner.: Yes ..but I am not sure about using `::`? That's why said. — user2736738, Dec 02 '17 at 05:11

nachiketkulk · Answer 1 · 2017-12-02T07:45:52.617

Signature of mmap is:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

To quote Michael Kerrisk:

The length argument specifies the size of the mapping in bytes. Although length doesn’t need to be a multiple of the system page size (as returned by sysconf(_SC_PAGESIZE)), the kernel creates mappings in units of this size, so that length is, in effect, rounded up to the next multiple of the page size. - The Linux Programming Interface (Chapter 49)

To quote Robert Love:

The mmap( ) system call operates on pages. Both the addr and offset parameters must be aligned on a page-sized boundary. That is, they must be integer multiples of the page size. Mappings are, therefore, integer multiples of pages. If the len parameter provided by the caller is not aligned on a page boundary—perhaps because the underlying file’s size is not a multiple of the page size—the mapping is rounded up to the next full page. The bytes inside this added memory, between the last valid byte and the end of the mapping, are zero-filled. Any read from that region will return zeros. Any writes to that memory will not affect the backing file, even if it is mapped as MAP_SHARED. Only the original len bytes are ever written back to the file. - Linux System Programming (Chapter 4)

This shows that the mmap memory can always be larger than requested. But the OP's question is rather about what happens if the mmap memory is strictly larger than the file backing it up, e.g., will a memory access beyond the file size create a fault? Will it silently extend the file by zeroes? Etc. — Hagen von Eitzen, Dec 02 '17 at 04:38
The question is asking something more along the lines of : imagine we mmap() an empty file with a size = 10Kbyte, and write to offset 9K. Does that get saved to the file? — Riking, Jun 11 '18 at 21:38

Basile Starynkevitch · Answer 2 · 2017-12-02T08:32:41.343

I assume your system is running Linux. Be sure to read intro(2).

We can mmap(2) files above their size because if we couldn't, only files with an exact multiple of the page size (generally 4Kbytes, perhaps 1Mbytes, see sysconf(3) with PAGESIZE) could be memory mapped. If that was the case memory mapped files would be much less useful. Also, the size of an mmap-ed file can vary with time (other processes write(2)-ing and appending to it, calls to ftruncate(2), etc...) so it makes no sense for the kernel to require (or enforce) that it does not change.

Read carefully the documentation of mmap(2), it says:

A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file.

^{(so of course the kernel is doing some checks, probably much more than what you imagine)}

and mmap could fail, so your code should check that, e.g. by following it with:

 if ((void*)addr == MAP_FAILED) 
      { perror("mmap"); exit(EXIT_FAILURE); };

BTW, your question is not C++ specific but is POSIX or Linux specific (other operating systems might not provide memory mapped files, or could put other constraints on them).

Notice that memory mapping is very common. It is used by mmap and also at execve(2) time. You can understand the virtual address space of some given process by using /proc/ (see proc(5) and try cat /proc/self/maps and cat /proc/$$/maps in your terminal). And mmap is used quite often: by malloc(3) and operator new, by dlopen(3), by ld-linux(8) on dynamically linked shared libraries.

Read also some book on Linux or POSIX programming (e.g. the old Advanced Linux Programming, freely downloadable, or something newer) and Operating Systems: Three Easy Pieces.

why we can mmap to a file but exceed the file size?

2 Answers2

Linked