How is mmap() supposed to behave if the requested mapping size is larger than the file?

Question

To my understanding, mmap() is for bulk loading of files into memory. As such they can be used as such:

// Assuming PAGE_SIZE == 4096
int fd = open("/some/large/resource", O_RDONLY);
void *buffer = mmap(NULL, 8192, PROT_READ, MAP_SHARED, fd, 0);
// Use buffer
munmap(buffer, 8192);

That should achieve similar effect to, and be ostensibly faster than, this:

int fd = open("/some/large/resource", O_RDONLY);
void *buffer = malloc(8196);
read(fd, buffer, 8196);
// Use buffer
free(buffer);

What would happen if /some/large/resource was in fact, not 8192 bytes, but 4096 bytes. One entire page of memory smaller? For the second case, read() would return 4096 indicating that many bytes were successfully read, and the rest of the buffer would be indeterminate but valid writing space.

But what would (should?) happen in the mmap() case? And does what happens depend on the flags, i.e., if I used MAP_PRIVATE instead of MAP_SHARED? Would the memory in the next page be inaccessible? Would the memory be accessible but indeterminate as in the malloc()/read() case? Would the memory be zeroed out?

Disclaimer: This is a followup to How to correctly use mmap() and newBufferWithBytesNoCopy together? I am asking this new question to determine if the problem described in the older question is an mmap() problem or some other problem.

For reference: [`mmap()`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html) — Oka, Jun 14 '23 at 02:10

score 3 · Accepted Answer · answered Jun 14 '23 at 08:34

3

Accessing a page in mmap region but completely out of file size will cause a bus error.
The mmap function will succeed, returning a pointer which seems right.
But if you access a out-of-file page, you will get a bus error.
In man mmap:

SIGBUS Attempted access to a page of the buffer that lies beyond the end of the mapped file.

answered Jun 14 '23 at 08:34

XbzOverflow

146
3

Is it possible to circumvent that? I.e., an argument or setting that would make the data valid 0 memory instead of triggering a BUS error? – user16217248 Jun 14 '23 at 17:24
@user16217248 I didn't find a direct way to do that. Perhaps you can do it manually. e.g. You can mmap one region with least pages the file needs. And then mmap with MAP_ANONYMOUS and MAP_FIXED to concatenate two regions and make it behave like what you want. – XbzOverflow Jun 15 '23 at 06:51
Interesting, thanks for sharing. – user16217248 Jun 15 '23 at 06:52
Is there any way to find out where the pages are valid? Aside from separately measuring the size of the file? – user16217248 Jun 16 '23 at 23:25
@user16217248 I didn't find an easier way by now. – XbzOverflow Jun 17 '23 at 01:19

How is mmap() supposed to behave if the requested mapping size is larger than the file?

1 Answers1

Linked