3

I'm reading a file into memory in C by copying the bytes to a dynamic array. Currently, I realloc() one byte larger each time a new byte comes in. This seems inefficient.

Some suggest (I can't remember where) that doubling the memory each time more is needed is good because it's O(log n) allocation time, with the only expense of a worst case of just under half of the memory being unused.

Any recommendations on memory allocation?

Delan Azabani
  • 79,602
  • 28
  • 170
  • 210

3 Answers3

6

Do what some suggest (increase the size of the buffer by a multiplicative factor each time you need more room). I've done this many times and it works well. If you don't like the factor of two, you can use something else. I've used Phi (the golden ratio) to good effect.

Steve Emmerson
  • 7,702
  • 5
  • 33
  • 59
6

If you are loading the whole file into a string you could probably use the method outlined in this question. This way you can get the size of the file in bytes and allocate your string to hold that (Don't forget the extra byte for the null character).

However, if you are dynamically growing a string it is better to increase it's size by some factor that's larger than a single byte (reallocating a string each byte is going to be very slow, especially if the string has to be allocated in a new area of memory and then copied over). Since you are reading a file doubling it is probably very reasonable. I've seen people use other methods to do this as well, for example:

  1. I've seen people round to the next power of 2, for example 2, 4, 8, then 16 bytes. (which is essentially doubling the size of the file each time).

  2. I've also seen people use a value that's more suited for the strings they intend to read, ie. 100 bytes at a time.

If you over allocate the string you could always get that memory back at the end with a final reallocation to the exact size you need.

Community
  • 1
  • 1
GWW
  • 43,129
  • 11
  • 115
  • 108
  • 2
    I like to start with a size that's a power of 2 minus 1, and then double-and-add-one each time I run out of space. That ensures that the new size will always be greater than or equal to the old size without including any special handling for unsigned wrapping modulo `SIZE_MAX+1`. Simply doubling could result in stepping from a successful `SIZE_MAX/2+1` byte buffer to a "successful" 0-byte buffer and subsequent heap clobbering (assuming your system's `malloc(0)` returns non-NULL). – R.. GitHub STOP HELPING ICE Nov 07 '10 at 05:34
2

I don't have a cite for this in front of me, and it is probably an implementation-specific detail, but I believe that power-of-2-resized pointers are what are used to resize C++ STL's string objects, as characters are continually added. (It should be easy to verify this by calling the string::capacity method as characters are added.)

Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345