8

Here is the example code I wrote.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int main()
{
    int fd;
    long pagesize;
    char *data;

    if ((fd = open("foo.txt", O_RDONLY)) == -1) {
        perror("open");
        return 1;
    }

    pagesize = sysconf(_SC_PAGESIZE);
    printf("pagesize: %ld\n", pagesize);

    data = mmap(NULL, pagesize, PROT_READ, MAP_SHARED, fd, 0);
    printf("data: %p\n", data);
    if (data == (void *) -1) {
        perror("mmap");
        return 1;
    }

    printf("%d\n", data[0]);
    printf("%d\n", data[1]);
    printf("%d\n", data[2]);
    printf("%d\n", data[4096]);
    printf("%d\n", data[4097]);
    printf("%d\n", data[4098]);

    return 0;
}

If I provide a zero byte foo.txt to this program, it terminates with SIGBUS.

$ > foo.txt && gcc foo.c && ./a.out 
pagesize: 4096
data: 0x7f8d882ab000
Bus error

If I provide a one byte foo.txt to this program, then there is no such issue.

$ printf A > foo.txt && gcc foo.c && ./a.out 
pagesize: 4096
data: 0x7f5f3b679000
65
0
0
48
56
10

mmap(2) mentions the following.

Use of a mapped region can result in these signals:

SIGSEGV Attempted write into a region mapped as read-only.

SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file).

So if I understand this correctly, even the second test case (1-byte file) should have led to SIGBUS because data[1] and data[2] are trying to access a portion of the buffer (data) that does not correspond to the file.

Can you help me understand why only a zero byte file causes this program to fail with SIGBUS?

Lone Learner
  • 18,088
  • 20
  • 102
  • 200
  • @Olaf While reading the [man page](https://linux.die.net/man/2/mmap) and the [POSIX documentation](http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html), I could not be sure that I am indeed invoking undefined behaviour. The man page makes no mention of such behaviour being undefined behaviour. Neither does the POSIX documentation. As per my interpretation of the both documentation, I should get `SIGBUS` for both the tests in my question. – Lone Learner Jan 01 '17 at 14:10
  • 2
    @Olaf I don't have any problem per se. I have a question out of curiousity. The question can be summarized to: The `man` page on Linux indicates that any access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file) should lead to SIGBUS. But my second test seems to contradict what the `man` page says on the same Linux system. – Lone Learner Jan 01 '17 at 14:24
  • 3
    Completely valid reason to ask a question, curiosity should be encouraged. I don't think it's constructive go around saying questions aren't worth asking. – Daniel Porteous Jan 01 '17 at 14:40
  • 3
    Accessing a page that's mapped, where the entire page is beyond the end of the file mapped, doesn't appear to be UB. The [POSIX standard for `mmap`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html) clearly states that mappings larger than the underlying file are allowed, but accessing such a page can result in `SIGBUS`: *The `mmap()` function can be used to map a region of memory that is larger than the current size of the object. Memory access within the mapping but beyond the current end of the underlying objects may result in `SIGBUS` signals being sent to the process.* – Andrew Henle Jan 01 '17 at 14:41

2 Answers2

6

You get SIGBUS when accessing past the end of last whole mapped page because the POSIX standard states:

The mmap() function can be used to map a region of memory that is larger than the current size of the object. Memory access within the mapping but beyond the current end of the underlying objects may result in SIGBUS signals being sent to the process.

With a zero-byte file, the entire page you mapped is "beyond the current end of the underlying object". So you get SIGBUS.

You do NOT get a SIGBUS when you go beyond the 4kB page you've mapped because that's not within your mapping. You don't get a SIGBUS accessing your mapping when your file is larger than zero bytes because the entire page gets mapped.

But you would get a SIGBUS if you mapped additional pages past the end of the file, such as mapping two 4kB pages for a 1-byte file. If you access that second 4kB page, you'd get SIGBUS.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • This answer seems to explain the behaviour I observe very accurately. I changed my `mmap()` call to map `2 * pagesize` bytes instead of `pagesize` and indeed accessing `data[4096]` led to `SIGBUS` with this change. – Lone Learner Jan 01 '17 at 14:53
3

A 1-byte file does not lead to the crash because mmap will map memory in multiples of the page size and zero the remainder. From the man page:

A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.

Marcus Ilgner
  • 6,935
  • 2
  • 30
  • 44
  • I am not convinced this reasoning is correct. If I try to print `data[4096]`, `data[4097]`, etc. in the second test case, I get some garbage values printed. I do not get `SIGBUS` which I should have if the reasoning presented in this answer held good. – Lone Learner Jan 01 '17 at 14:18
  • ... up to 4095, 4kB - 1 – 4pie0 Jan 01 '17 at 14:19
  • That depends on the OS configuration and defaults. I just did a `getconf PAGESIZE` on my Windows 10 machine and it returned 65536 bytes. – Marcus Ilgner Jan 01 '17 at 14:20
  • Oh, sorry, I was talking about Linux... In case of Windows, well I am not surprised if it crashed when you tried it. – 4pie0 Jan 01 '17 at 14:21
  • 1
    On my system the page size is 4096 as recorded in the example output I have included in my question. – Lone Learner Jan 01 '17 at 14:22
  • 1
    Lone Lerner, but mmap keeps metadata about bytes it has allocated, you can make the memory executable, only read/write etc, so it knows something about it. In this case you assume mmap does not do anything about bad access and this is why accessing data[1] is legal even if you asked for 1 byte. But then this is not granularity of 4kB that is answer but that granilarity + the fact that mmap does not do anything about it. – 4pie0 Jan 01 '17 at 14:26
  • @Lone Learner: Just read the updated question. My only guess is that you're lucky and somehow `data[4096]` ends up pointing to some valid location somewhere else in your process' virtual memory, outside of the area that `mmap` is responsible for. I wouldn't count on this being 100% reproducible though. As Olaf already wrote in his comment to the question this is UB-land. – Marcus Ilgner Jan 01 '17 at 14:35
  • @ma_il I think [Andrew's answer](http://stackoverflow.com/a/41416233/1175080) accurately explains what behaviour is well defined to cause `SIGBUS` and my second test indeed does not seem to fall within the scope of this defined behaviour. – Lone Learner Jan 01 '17 at 14:48