Perfomance-consumption of realloc()

Question

I'm wondering how much performance a realloc() really costs: I'm doing it quite often to extend an available memory area by one element (=specific structure). Is - thanks to the MMU - such an realloc() just the extension of the reserved memory area or is there a complete copying of all data imaginable under some conditions?

As far as I know a std::vector very often has to copy the memory area when it's size increases and the predefined amount of memory is too small...

@Olaf please read my posting carefully: I do not extend a struct but I extend a memory area by an element which is some kind of struct. Nevertheless all this wasn't my question... — Elmi, Sep 12 '16 at 11:25
"I'm doing it quite often to **extend an available memory area by one element** (=specific structure)" - Reads exactly like your code includes wildly casting (which can easily result in UB). — too honest for this site, Sep 12 '16 at 11:28

Art · Answer 1 · 2016-09-12T12:04:03.720

realloc copies all the data. Assuming anything else is just asking for performance trouble. The situations when realloc can avoid copying are few and you should absolutely not count on them. I've seen more than one implementation of realloc that doesn't even bother implementing the code to avoid copying because it's not worth the effort.

The MMU has nothing to do with it because the cost to remap the pages of the memory backing an allocation don't pay off until you hit more than two pages. This is based on research I read 15 years ago and since then memory copying has become faster, while memory management has become more expensive because of MP systems. This was also for zero-copy schemes inside the kernel only, without passing the syscall overhead, which is significant and would slow things down here. It would also require that your allocation is perfectly aligned and sized, further reducing the usefulness of implementing realloc this way.

At best realloc can avoid copying data if the memory chunk it would expand into is not allocated. If realloc is the only thing your application does you might get lucky, but as soon as there's just a little fragmentation or other things allocate, you're out of luck. Always assume that realloc is malloc(new_size); memcpy(new, old, old_size); free(old);.

A good practice when dealing with resizing arrays with realloc is to keep track of how many elements you have in the array and have a separate capacity. Grow the capacity and realloc only when the number of elements hits the capacity. Grow the capacity by 1.5x on every realloc (most people do 2x, it's often recommended in literature, but research shows that 2x causes very bad memory fragmentation problems, while 1.5x is almost as efficient and is much nicer to memory). Something like this:

if (a->sz == a->cap) {
    size_t ncap = a->cap ? a->cap + a->cap / 2 : INITIAL_CAP;
    void *n = realloc(a->a, ncap * sizeof(*a->a)); 
    if (n == NULL)
         deal_with_the_error();
    a->a = n;
    a->cap = ncap;
}
a->a[a->sz++] = new_element;

This works even for the initial allocation if your struct containing the array is zero initialized.

score 1 · Answer 2 · edited May 23 '17 at 10:33

1

Copying data is not the expensive part (though some may disagree). Hitting the embedded malloc and free is expensive, and could account for almost all of your execution time, depending on what else you are doing. If so, fixing it should give you a big speedup.

This is how I tell what fraction of time things spend.

The simplest solution is to do it less often. When you allocate an array, allocate it extra large, and then keep track yourself of how much of it you are actually using.

edited May 23 '17 at 10:33

Community

1
1

answered Sep 12 '16 at 12:04

Mike Dunlavey

40,059
14
91
135

"Copying data is not the expensive part." Are you sure? The cost of copying the data if we're copying on every element would be O(n^2), while the cost of `malloc` would just be somewhere between O(n) and O(n log n) with a reasonable `malloc` implementation. – Art Sep 12 '16 at 12:14
@Art: Be pragmatic. Take a few stack samples and just see what it's doing. I'm relaying my experience, where memory allocation and freeing can easily be the dominant activity in a program. Big-O is OK as far as it goes, but it ignores constant factors, so it is less useful than sampling in real-world software. – Mike Dunlavey Sep 12 '16 at 12:31
memcpy() is indeed not very expensive, I had a bug in my application which caused a lot of too much memcpy() but this did not result in large and time-consuming CPU-load, I found this problem because related threads did not cause the expected load on the CPU (comparing to an operation which does really some computations) – Elmi Sep 12 '16 at 12:39
I am pragmatic. I spent the 90s reading papers about zero-copy schemes, which was all the rage back then (until maybe 10 years ago, when it died down because zero-copy schemes are wildly impractical) because memory copying is such a bloody expensive thing. I've also spent a lot of time working inside memory allocation and know that `malloc` and `free` and pretty cheap today while copying memory is still painfully slow. Every other profile I look at has `memcpy` at the top. – Art Sep 12 '16 at 12:41
@Art: We could debate whether *memcpy* or *malloc/free* are cheaper, and it may depend on our specific experiences, but I think that's the lesser point. I've seen between 50 and 90 percent of time going into those things, and in every case it was not very hard to simply call it a *lot* less, saving most of that time. – Mike Dunlavey Sep 12 '16 at 12:59

Ahmad Siavashi · Answer 3 · 2016-09-12T11:33:35.483

The behavior really depends on the implementation. But all try to minimize the cost of relocating the memory. Because relocation is very expensive for performance. It has a direct impact on cache. I have no numbers, but it is very expensive operation.
For example, in case of relocation, if the runtime faces two options of relocating the memory or extending the currently reserved one, it chooses the latter.
But it is not as simple as I said. It also has to consider memory fragmentation.
So there are several trade-offs to satisfy.
In case of vector that you mentioned, they use a different scheme. If the vector has m bytes in reserve, and it needs an extra n bytes, the runtime will allocate 2 * (n+m) to minimize the possibility of a future relocation. If you exceed the new size, next time it will use a factor of 4 instead of 2; and so on. Numbers I mentioned are not real.
I'm not very into the implementations, hope others give you more specific information.

Perfomance-consumption of realloc()

3 Answers3