0

I have a program where it makes sense for me to allocate my own heap, as I have a certain read operation that could take some indeterminate amount of space that I need to store into memory contiguously for ease of ease. I understand that libc sets up the heap for malloc and friends, and that calling sbrk and brk could mess with that. I don't intend to use malloc or the heap otherwise in my program.

Put simply, I want to have a "heap"-like area of memory, where I can request the operating system to grow it as needed, and it is not backed by the file system.

Even if only for learning purposes, how can I:

  • prevent heap set-up by libc
  • set up my own heap
  • extend it properly using sbrk and brk
user129393192
  • 797
  • 1
  • 8
  • I'm not sure how that is contradictory. From my understanding, `sbrk` requests the OS to grow the heap @chux-ReinstateMonica – user129393192 Jun 13 '23 at 04:00
  • Perhaps I should label it POSIX @chux-ReinstateMonica? – user129393192 Jun 13 '23 at 04:10
  • Oh, I understand the concept of a dynamic memory allocator @chux-ReinstateMonica. My question is specifically: How do I prevent `libc` set-up of the heap and set up my own with a certain starting size that I can then use? If you can direct me to documentation, I'd happily read it and make a coding attempt. – user129393192 Jun 13 '23 at 04:22
  • 1
    As long as you don't use `brk` to unallocate a block that libc is using, and as long as you don't run off the end of a block you allocate with `sbrk`, you should be fine calling `sbrk` to allocate memory blocks. – Chris Dodd Jun 13 '23 at 04:30
  • Do you know of a way to prevent the `libc` initial heap-set up @ChrisDodd ? – user129393192 Jun 13 '23 at 04:31
  • libc might allocate a block or three with sbrk, but as long as you don't assume blocks you're getting are contiguous, and you don't run off either end of a block you allocate, you should not need to do anything else. – Chris Dodd Jun 13 '23 at 04:32
  • 2
    I have the strong impression this is an [XY Problem](https://xyproblem.info/). You probably should tell us about the _actual_ problem you're trying to solve. – Jabberwocky Jun 13 '23 at 04:51
  • I had originally asked [here](https://stackoverflow.com/questions/76454834/efficient-2-pass-using-heap-memory), but it was closed. I revised/edited to make it about this after the fact. The real problem is just that I'm reading a stream-device and don't know how much space I'll need. – user129393192 Jun 13 '23 at 04:55
  • 4
    The easiest way to deal with that problem is to not try to force everything to be contiguous -- just allocate multiple largish blocks as needed and deal with the fact they might not be contiguous. – Chris Dodd Jun 13 '23 at 05:33
  • What is wrong with the `sbrk` strategy @ChrisDodd ? – user129393192 Jun 13 '23 at 05:35
  • it won't necessarily give you contiguous blocks – Chris Dodd Jun 13 '23 at 05:53
  • How do you mean? From my understandinng, the virtual addresses provided by `sbrk` will be contiguous (perhaps not in physical RAM) – user129393192 Jun 13 '23 at 05:58
  • @user129393192 _"The real problem is just that I'm reading a stream-device and don't know how much space I'll need"_, yes I think we understood this. What I meant is: what _actual_ problem are you trying to solve? What kind of processing are you doing on the input stream? Is the file you read a text file, etc. etc.? – Jabberwocky Jun 13 '23 at 06:34
  • I’m reading a text file and performing a cipher and storing into some other file (or stream) — or potentially even the same one. It requires two passes. @Jabberwocky – user129393192 Jun 13 '23 at 06:38
  • If it's a text file you could just store the individual lines. This is easy and straight forward. If possible I'd implement this and worry about performance later. – Jabberwocky Jun 13 '23 at 06:40
  • 2
    It seems that you are trying to solve a task with the wrong tool. Like using a screwdriver to hammer a nail. – Support Ukraine Jun 13 '23 at 07:30
  • 3
    I also read your first (now closed) question. You never describe what problem(s) you you are facing with a "normal" use memory allocation. Why "invent" something special (and, sorry, rather bizar). To me it seems standard C has all the tools you need for doing a "2 pass" iteration of the input data. – Support Ukraine Jun 13 '23 at 08:12
  • Did you look at `mmap()`? It can be used without files. – 12431234123412341234123 Jun 13 '23 at 10:26
  • *it is not backed by the file system* Why not? – Andrew Henle Jun 13 '23 at 11:28
  • @user129393192 *I’m reading a text file and performing a cipher and storing into some other file (or stream) — or potentially even the same one. It requires two passes.* If you're overwriting the same file, what are you going to do if the process is interrupted in the middle of overwriting the file, such as by power loss? Or a code error that causes something like a `SIGSEGV`? You will have overwritten the beginning of the file, and there is a significant potential for data loss. Do you know how you'd recover from that? If not, don't overwrite the same file. – Andrew Henle Jun 13 '23 at 11:32
  • 1
    @user129393192 Also, about *The real problem is just that I'm reading a stream-device and don't know how much space I'll need* You **can't** do a two-pass implementation when the source is a generic stream of unknown size - there's no guarantee the stream is seekable so you can't be sure you can seek back to the beginning and replay the stream. – Andrew Henle Jun 13 '23 at 11:38
  • @AndrewHenle that was exactly why I was looking to save the data somewhere ... simply calling `sbrk` every time I hit the end of my allocated memory was what I'm looking for. The stream doesn't need to be seekable because I'll save it somewhere on the first pass and keep track of size. – user129393192 Jun 15 '23 at 00:38
  • @SupportUkraine so what do you suggest, if I want to store it in memory in a contiguous location that I can keep extending the end of? That was why I thought `sbrk` and just extending my data segment would be very convenient. I'm also doing this as a learning project so I thought it would be good to learn how that works. – user129393192 Jun 15 '23 at 00:38
  • I don't see how this problem is solved with normal memory allocation, and I have not found a solution. Perhaps something like a linked list of malloc'd buffers? – user129393192 Jun 15 '23 at 01:54
  • @user129393192 You write: " I want to store it in memory in a contiguous location" My question is why? I don't see any reason for that. E.g. ten non-contiguous blocks will do just as well. Or simply use `realloc`. If used correct you probably wont see any performance impact that matters for this task. – Support Ukraine Jun 15 '23 at 04:35
  • So you’re suggesting a linked list type structure or `realloc` if necessary @SupportUkraine? – user129393192 Jun 15 '23 at 05:29
  • 3
    I would start with `realloc`. And double the memory amount on each `realloc`. Perhaps with an upper limit. If `realloc` turns out to be too slow (which I doubt) I would try a linked list of memory trunks and write simple functions to make it appear as one big memory block. I'm sure it will meet the performance requirements. And without any tricky code – Support Ukraine Jun 15 '23 at 07:18
  • *The stream doesn't need to be seekable because I'll save it somewhere on the first pass and keep track of size* If you don't know how much data you're going to get, how do you know it'll fit in memory? – Andrew Henle Jun 15 '23 at 19:17
  • That's the point where the application returns an error @AndrewHenle – user129393192 Jun 15 '23 at 19:21

1 Answers1

0

As said in the comments, you shouldn't do that this way. If the data comes from a file, you may be able to use mmap() to map the file and accessing it like any other memory. The reading from the disk to RAM that is accessible by your program is then handled by the kernel. If that is not possible, you can, for small amounts of data, use large enough buffers and use realloc() when you need more. If you have so large data that this becomes too slow, it is probably better to write a class that handles read and write access to the array and doesn't store it continuously.

But it is possible to write your own heap. You can use mmap() to reserve memory and expand it when needed. You can reserve it like this:

    long ps = sysconf(PAGESIZE);
    if(ps<0) { /*TODO handle error*/ return -1; }
    *start = mmap(NULL,ps*1024,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0);
    if(!start) { /*TODO handle error*/ return -1; }
    
    //Here you have 1024*ps space to handle your own malloc

The value of 1024*ps does not mean your application is actually using that much memory, pages reservation is lazy, meaning the are only reserved after the application tries to read or write from/to it.

On Linux, you could also extend the space by calling mremap(). But this may not work, for example when the same memory is already used somewhere else. There are other problems with this proposal.

To learn more, use man mmap.

  • *But this may not work, for example when the same memory is already used somewhere else.* [It won't just work, it'll stomp on anything in the way](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/mmap.html): "The mapping established by mmap() shall replace any previous mappings for those whole pages containing any part of the address space of the process starting at pa and continuing for len bytes." – Andrew Henle Jun 13 '23 at 11:20
  • @AndrewHenle Correct, edit it to suggest `mremap()` to avoid that (AFAIK it doesn't suffer from the same problem). – 12431234123412341234123 Jun 13 '23 at 13:19
  • `mremap()` isn't POSIX, though I don't think there's a fully POSIX-compliant way to be guaranteed contiguous memory. `MAP_FIXED` to get a guaranteed mapped address isn't POSIX, for example, it's an extension, albeit a common one. Although I think it might be all moot at this point - my last comment above pointed out a fatal flaw in any two-pass algorithm trying to process a generic stream: there's no guarantee a generic stream is seekable. – Andrew Henle Jun 13 '23 at 16:32