1

I'm thinking about re-implementing the malloc(3) function (as well as free(3), realloc(3) and calloc(3)) using mmap(2) and munmap(2) syscalls (as sbrk(2) is now deprecated on some operating systems).

My strategy to allocate memory blocks on the page returned by mmap(2) would be to store metadata right before the block of data. Thus the metadata could consist of 3 attributes:

  • is_free : a char (1 byte) to tell if the block is considered free or not;
  • size : an size_t (4 bytes) with the size of the block in term of bytes;
  • next : a pointer (1 byte) to the next block's metadata (or to the next page first block if there's no more space after the block).

But as I can't use malloc to allocate a struct for them, I would simply consider putting 6 bytes of metadata in front of the block each time I create one:

+---------+---------+--------+------------------+---------+---------+--------+---------+---
| is_free |  size   |  next  |      Block1      | is_free | size    |  next  | Block2  | ...
+---------+---------+--------+------------------+---------+---------+--------+---------+---
| 1 byte  | 4 bytes | 1 byte |     n bytes      | 1 byte  | 4 bytes | 1 byte | m bytes | ...
+---------+---------+--------+------------------+---------+---------+--------+---------+---

How can I be sure the user/process using my malloc won't be able to read/write the metadata of the blocks with such architecture?

Eg: With the previous schema, I return the Block1's first byte to the user/process. If he/it does *(Block1 + n) = Some1ByteData he/it can alter the metadata of the next block which will cause issues with my program if I try to allocate a new block later on.

On the mmap(2) man page I read that I could give protection flags for the pages, but if I use them, then the user/process using my malloc won't be able to use the block I give. How is it achieve in the real malloc?

PS: For the moment I don't consider thread-safe implementation nor looking for top-tier performances. I just want something strong and functional.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Karzyfox
  • 319
  • 1
  • 2
  • 15
  • 3
    Re “How is it achieve in the real malloc ?”: It is not achieved. Common hardware does not support memory protection below the page level, and implementations of `malloc` are subject to clients damaging their data. – Eric Postpischil Dec 08 '21 at 15:10
  • 1
    Note that the returned pointers will need to be aligned to a `_Alignof(max_align_t)` byte boundary, so you may need some padding after the 6 bytes of metadata. – Ian Abbott Dec 08 '21 at 15:10
  • C deals with the problem of out-of-bounds accesses by calling it "undefined behavior". – Ian Abbott Dec 08 '21 at 15:15
  • @IanAbbott Well, here the metadata are accessible addresses. It's just that I would not like the user to mess up with them. Do you mean that the only way to solve it is by separating metadata from the data and put them in different pages ? – Karzyfox Dec 08 '21 at 15:17
  • 1
    @Karzyfox exactly. – Marcus Müller Dec 08 '21 at 15:22
  • 1
    No, I mean the existing implementations do not attempt to solve the problem of corrupted meta-data (but special purpose implementations may provide help to track down such bugs). – Ian Abbott Dec 08 '21 at 15:23
  • 1
    Since your library is running in the same process as the application, anything it can do can also be done by the application itself. So if you protected that memory, then your library wouldn't be able to update it, either. – Barmar Dec 08 '21 at 16:02
  • Existing implementations detect certain cases when their metadata has been corrupted and [tell you about it](https://stackoverflow.com/q/57922372/14215102). –  Dec 08 '21 at 16:05

0 Answers0