0

I posted previously here, but based on the comments, I had the wrong idea. Here is my question clearly: when reading from a stream, with an unknown amount of bytes coming in, what is the best way to save the data in memory to use at a later point?

Here are some ideas I have had:

  • mmap a non-conflicting region of memory, and mmap some more as necessary (downside: possibly annoying to manage)
  • create a linked-list structure of buffers, filling each before allocating the next (possibly easy to manage, given I write the API)
  • allocate (on the stack) an array of pointers to such buffers (say 100, allocated using malloc) and hope that you don't have more than (100 * buffer size) worth of data to read (the easiest but least robust solution)

My idea, that the other post heavily rejected was:

  • mess with sbrk and brk and do my own contiguous heap (which could then conflict with malloc and anyone that uses malloc -- though I personally wouldn't mind this since I only use the APIs in unistd for this application, and it is a learning project)

Is there a standard method (or even a specific function) for attacking this specific problem: an unknown amount of data needs to be read and saved in memory, how do you save it?

user129393192
  • 797
  • 1
  • 8
  • I just answered something similar: [Avoid allocating a large amount of memory for concatenation in C programming](https://stackoverflow.com/a/76477992/5382650) – Craig Estey Jun 15 '23 at 02:23
  • From what i understand of this and your previous question, it seems that `realloc` is your best bet here. Either implement a single buffer (reallocate if necessary), a list of buffers (reallocate pointer buffer for each new chunk, then for each chunk `malloc` a new region) or before you run out of memory, a ring buffer would be the better choice (same as the second option but in this case, you overwrite older entries - of course only if that is an option, otherwise you have to think to swap data out into a file or something). – Erdal Küçük Jun 15 '23 at 02:32
  • 1
    Why messing around with `brk` and `sbrk` if `malloc` and friends does that already and presumably much better than you could. – Erdal Küçük Jun 15 '23 at 02:34
  • If you were doing this in C++ or golang or another modern language with collections like lists or vectors, you'd want one that uses something like a linked list (otherwise there would be too much overhead copying and resizing a buffer). I'd suggest the same for C, use a linked list (no language standard, but so many examples to copy from). – John Bayko Jun 15 '23 at 02:38
  • Re: "Why messing around with ..." Mostly because I wanna learn how to do it. I'm planning to also write my own `malloc` implementation at some point, but this is not that @ErdalKüçük. I am a student in uni. Seems like `realloc` would be a good option since large reads > buffer size are probably not a common case (but I've learned to always strive for robustness). I'll either try that out or implement a linked list ... thanks for your comment. Also @John Bayko as well, I know of `TAILQ` and some others, but like to always write my own (for now) for the practice, which is why I use C. – user129393192 Jun 15 '23 at 02:40
  • I see, you want to make your hands dirty ... have fun! – Erdal Küçük Jun 15 '23 at 02:49
  • @user129393192, Use "create a linked-list structure of buffers, filling each before allocating the next ". The goal of _contiguous_ is not really that valuable. – chux - Reinstate Monica Jun 15 '23 at 05:37
  • If contiguous is an important goal, then: mmap/mremap, or malloc/realloc. But, "unknown amount of bytes coming" contradicts "memory storage", or you have an upper bound? – Jean-Baptiste Yunès Jun 15 '23 at 09:59
  • @Jean-BaptisteYunès I'm just reading from stdin, so the upper bound could be infinity ... but obviously that's not realistic. I'm likely gonna do the malloc/realloc for this specific use. – user129393192 Jun 15 '23 at 19:12

0 Answers0