-1

I learned that pages allocated by mmap() can be unusable, even if the call returned success. For example if a file is being mapped and the len passed to mmap() is sufficiently larger than the file. In that case, any pages beyond the pages used by the file, even if within the requested size, will yield a SIGBUS if they are attempted to be used.

Is there any way to directly determine if all the bytes from the beginning of the mapping up to len will be accessible safely or not? Can I do this besides manually measuring the length of the file, or setting up a SIGBUS handler, or otherwise (exhaustively) checking if any of the reason(s) a page would be inaccessible have been met?

// Assuming PAGE_SIZE == 4096
int fd = open("file", O_RDONLY);
void *buffer = mmap(NULL, 8192, PROT_READ, MAP_SHARED, fd, 0);
// Is there any way to determine if reading bytes 4093-8192 will cause a bus error?
// Besides measuring if the size of the file is greater than 4092?

Rather than checking the file size to determine if any pages would be unavailable because the file is too small, can I directly determine if any pages are unavailable for any reason?

user16217248
  • 3,119
  • 19
  • 19
  • 37
  • What is the difference between "query the number of accessible/usable pages" and "measuring the length of the file"? Seems a bit X/Y. – pmacfarlane Jun 17 '23 at 00:03
  • @pmacfarlane X/Y Problem: Asking how to do Y because you think you can do X if you can do Y but you really should be asking how to do Y. My *X* in this case is *error checking* `mmap()` regions so that they can be processed or fail gracefully instead of getting a `SIGBUS`. – user16217248 Jun 17 '23 at 00:05
  • It sounds like you're asking "I want to do out-of-bounds array accesses in C, but I don't want undefined behaviour or a seg-fault". – pmacfarlane Jun 17 '23 at 00:13
  • @pmacfarlane Not really. I am asking to find out what the bounds of the *'array' is* so I can fail gracefully if it isn't big enough. – user16217248 Jun 17 '23 at 00:14
  • 1
    I'm sure you can do that with `fstat()` and friends, or muck about with `fseek()` and `ftell()`. – pmacfarlane Jun 17 '23 at 00:15
  • @pmacfarlane My issue is that would *'indirectly'* check if problem (pages being ineaccessible) *would have* occured for a *specific reason*. Can I check if the problem *occurred* for *any* reason? – user16217248 Jun 17 '23 at 00:19
  • I think your question does not match what you are saying in the comments. Maybe consider editing your question. I won't comment any more because it's hassling me into a chat I don't want to do. – pmacfarlane Jun 17 '23 at 00:22
  • 2
    Use `fstat` to get the file's size. Then, use `ftruncate` to grow/shrink the file. Then, do the `mmap`. – Craig Estey Jun 17 '23 at 00:49
  • I posted it before,and I'll post it again [How do you determine the size of a file in C?](https://stackoverflow.com/questions/8236/how-do-you-determine-the-size-of-a-file-in-c). Unless your question changes, I think this is the answer. – pmacfarlane Jun 17 '23 at 01:14
  • @pmacfarlane My question never changed, and that is not the answer. The answer might be *'you can't'* and one alternative is that. Here is an analogy. For `malloc()`, I could query the available memory of the OS to determine if it will fail, or I could just check its return value. For `mmap()`, I could query the file size to see if the pages will be unavailable, but there might be a more direct way to determine if all pages are available, which is what I'm looking for. – user16217248 Jun 17 '23 at 01:16
  • 1
    @user16217248 What's wrong with measuring the size of the file? You already have an open file descriptor, and compared to the time it takes for the system to do the memory-mapping gyrations necessary to do the actual mapping and then page in the file data, one extra access to the already-cached-because-you-just-opened-it file metadata to get the length will literally be unmeasurable. – Andrew Henle Jun 17 '23 at 01:23
  • @AndrewHenle So the *canonical way* is to just check if the file is separately large enough? – user16217248 Jun 17 '23 at 01:29
  • @user16217248 Pretty much. If you're on Linux, run something like `strace -o /tmp/out.txt ls`. You'll see the run-time linker loading all the libraries the `ls` executable needs with a bunch of `open()`/`fstat()`/`mmap()` calls. – Andrew Henle Jun 17 '23 at 01:34

1 Answers1

0

The canonical way would be to just use fstat() to measure the size of the file anyway, and fail if the file is too small. There is no more 'direct' way to query the availability of pages, and no need for one, as the manpage shows, under Signals, that the only way for SIGBUS to occur is if the file is too small:

SIGBUS Attempted access to a page of the buffer that lies beyond
      the end of the mapped file.  For an explanation of the
      treatment of the bytes in the page that corresponds to the
      end of a mapped file that is not a multiple of the page
      size, see NOTES.

The following code is an example of how mmap() could be safely used:

#define PAGE_ALIGN(S) ((S)+PAGE_SIZE-1&~(PAGE_SIZE-1))
#define EXPECTED_SIZE /* ... */

int fd = open("file", O_RDONLY);
struct stat statbuf;
if (fstat(fd, &statbuf) || PAGE_ALIGN(statbuf.st_size) < EXPECTED_SIZE)
    return EXIT_FAILURE;
void *mapping = mmap(NULL, EXPECTED_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (mapping == MAP_FAILED)
    return EXIT_FAILURE;
user16217248
  • 3,119
  • 19
  • 19
  • 37