38

I came across a situation where it would be useful to have unnecessary calls to realloc being optimized out. However, it seems like neither Clang nor GCC do such a thing (Compiler Explorer (godbolt.org)) - although I see optimizations being made with multiple calls to malloc.

The example:

void *myfunc() {
    void *data;
    data = malloc(100);
    data = realloc(data, 200);
    return data;
}

I expected it to be optimized to something like the following:

void *myfunc() {
    return malloc(200);
}

Why is neither Clang nor GCC optimizing it out? - Are they not allowed to do so?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Julius
  • 1,155
  • 9
  • 19
  • 10
    I would be really surprised it a compiler was allowed to remove calls to external functions. What if you link with your own library that implements `malloc`? – Gerhardh Nov 19 '18 at 11:26
  • There is a *little* awareness of library functions in the compiler, like memset and memcpy. Certainly not malloc or realloc, they are quite often replaced. – Hans Passant Nov 19 '18 at 11:32
  • 1
    It does change malloc to calloc (https://godbolt.org/z/8UE2qw) though, and I have seen it replacing two mallocs to a single one as well. – Julius Nov 19 '18 at 11:35
  • 1
    A compiler is not allowed to optimize out a function call if that function contains any side-effects. It's quite likely that allocating memory boils down to a side effect in the end, deeper down in the API. – Lundin Nov 19 '18 at 11:35
  • 12
    @Gerhardh malloc is not an external function, it's a part of the standard library. Compilers are allowed to inline it or otherwise implement it however they wish. – n. m. could be an AI Nov 19 '18 at 12:04
  • 1
    @Lundin A compiler is allowed to do anything under the as-if rule. Malloc and friends do not have observable side effects. – n. m. could be an AI Nov 19 '18 at 12:06
  • 11
    @Lundin: It is not true that a compiler is not allowed to optimize out a function call if the function contains any side effects. A compiler is not allowed to optimize away observable behavior. If a side effect (and its consequences) is not observable, it may be removed. – Eric Postpischil Nov 19 '18 at 12:38
  • 2
    @EricPostpischil The standard seems to disagree with you, C17 5.1.2.3 §4. "...need not evaluate part of an expression if it can deduce that its value is not used and **that no needed side effects are produced (including any caused by calling a function** or accessing a volatile object).". – Lundin Nov 19 '18 at 12:45
  • 3
    @Lundin It would make sense, but, why are two consecutive calls to malloc/free optimized out (https://godbolt.org/z/gBVXcp)? That wouldn't be allowed if it had a side effect, would it? – Julius Nov 19 '18 at 12:49
  • 13
    @Lundin: An unobservable side effect is not needed. – Eric Postpischil Nov 19 '18 at 12:56
  • 2
    @Julius [tough crowd tonight](https://www.cartoonstock.com/directory/t/tough_crowds.asp)! – chux - Reinstate Monica Nov 19 '18 at 18:42
  • 2
    @Gerhardh: You need to compile with `gcc -fno-builtin-realloc` or `gcc -fno-builtin` if you want to define your own `realloc`. Then it will be treated like any other external function, where calling it is a visible side-effect that optimization must preserve. See [GCC with -fno-builtin does not seem to work](https://stackoverflow.com/q/25272576) for an example. Current gcc will optimize `malloc`/`memset` into `calloc`. – Peter Cordes Nov 19 '18 at 20:31
  • 2
    realloc is used orders of magnitude less often than malloc/free, so it is a better use of compiler writers' effort to ignore realloc. Which doesn't mean that if someone motivated implements such an optimization it won't be happily integrated. – Marc Glisse Nov 20 '18 at 07:24

5 Answers5

25

Are they not allowed to do so?

Maybe, but optimization not done in this case may be due to corner functional differences.


If 150 bytes of allocatable memory remain,
data = malloc(100); data = realloc(data, 200); returns NULL with 100 bytes consumed (and leaked) and 50 remain.

data = malloc(200); returns NULL with 0 bytes consumed (none leaked) and 150 remain.

Different functionality in this narrow case may prevent optimization.


Are compilers allowed to optimize-out realloc?

Perhaps - I would expect it is allowed. Yet it may not be worth the effect to enhance the compiler to determine when it can.

Successful malloc(n); ... realloc(p, 2*n) differs from malloc(2*n); when ... may have set some of the memory.

It might be beyond that compiler's design to ensure ..., even if empty code, did not set any memory.

StayOnTarget
  • 11,743
  • 10
  • 52
  • 81
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 3
    I was thinking of that as well. However, [this example](https://godbolt.org/z/VaN2Ry) shows that a realloc between prevents the malloc/free from being optimized out. If you remove it, the compiler will optimize-out the malloc and free. - and as far as I can see there will be no difference in the result? – Julius Nov 19 '18 at 13:19
  • @Julius, in your code snippet, `// pu = realloc(data, 200);` leads to UB with `if(pu != NULL)`. If one also comments out `if(pu != NULL) { data = pu; }`, then your point is better made and I too see that the code _could_ be optimized. IMO, the compiler could not digest the code (too complex) and see the optimization. – chux - Reinstate Monica Nov 19 '18 at 13:34
  • you are probably right about the example being a bit too complex. I have [another example](https://godbolt.org/z/z4mGaT) which demonstrates that ``malloc`` is optimized away, while ``realloc`` isn't - although the result should be the same in both cases? To me it looks like ``realloc`` is just not optimized while ``malloc`` is. – Julius Nov 19 '18 at 14:11
  • This answer implies that leaked (unreachable) memory is an observable feature of the C execution model; ie, you cannot optimize it away. Was that intended? – Yakk - Adam Nevraumont Nov 19 '18 at 15:56
  • @Yakk-AdamNevraumont The leak is an effect of the code. yet is not the key functional difference. The amount of memory consumed (100 v 0) is a functional difference. – chux - Reinstate Monica Nov 19 '18 at 16:29
  • @chux I'm aware it makes a difference in physical hardware, but I'm talking about in the abstraction C is specified in. You can implement `free` as never freeing memory in a conforming C compiler as an example; this is quite different in hardware (as memory isn't freed), but no difference to the specification of C. And compiler optimization exists in that difference, between "naive mapping" of code and "what the compiler actually requires here". – Yakk - Adam Nevraumont Nov 19 '18 at 16:40
  • @Yakk-AdamNevraumont OP's example was put forth as a good candidate for allocation optimization. Why code A is not optimized as code B? This answer addresses how A differs from B. Since there is a case difference, a compiler might not optimize because of it. Your point seems to be even with this narrow difference can a compiler optimized or not: maybe - maybe not per the abstract machine. My point is to explore allocation optimization, one should use code C and D with no functional differences as OP provided in an above 1st comment. – chux - Reinstate Monica Nov 19 '18 at 16:57
  • The Standard never requires that allocation requests succeed, nor is the state of unallocated storage observable. A conforming implementation could replace any `malloc()` request with a larger one, and treat as a no-op any realloc() request whose source pointer cannot possibly identify a block smaller than the requested value. – supercat Nov 19 '18 at 17:09
  • 2
    @Julius Per [this code](https://stackoverflow.com/questions/53373421/are-compilers-allowed-to-optimize-out-realloc/53375442#comment93628477_53375442) , I see no reason disbarring optimization. Yet consider if both codes started with `char *data = malloc(100); if (data == NULL) { return NULL; } *data = 1`, the functions are different. With `realloc()` copying the first 100 bytes, the compiler may not see that copying uninitialized was not important in your code. BTW: 2nd compiler is C++ not C. Suggest comparing C to C. – chux - Reinstate Monica Nov 19 '18 at 17:21
  • 1
    @Julius IOWs, successful `malloc(n); ... realloc(p, 2*n)` differs from `malloc(2*n);` when `...` may have set some of the memory. It might be beyond that compiler's design to insure `...` code did not set any memory. – chux - Reinstate Monica Nov 19 '18 at 17:37
  • 2
    @chux That's an interesting thought. I could imagine it to be quite difficult to prove that there were no changes to the specific memory region in some cases - although it is most likely simple in other cases. – Julius Nov 19 '18 at 17:47
  • 2
    How does `malloc(n); ... realloc(p, 2*n)` differ from `malloc(2*n); ...`? – Andrew Svietlichnyy Nov 19 '18 at 20:38
  • @AndrewSvietlichnyy: `realloc(p, 2*n)` may fail and return `NULL`, in which case [the original pointer `p` remains a valid pointer](https://stackoverflow.com/questions/1607004/does-realloc-free-the-former-buffer-if-it-fails) to `n` bytes of memory. – Ilmari Karonen Nov 19 '18 at 21:16
  • @AndrewSvietlichnyy Consider successful allocations and `char *d = malloc(n); *d = 42; return realloc(d, 2*n)` vs `return malloc(2*n);`. Both return a pointer to `2*n` bytes. The first retains `d[0]` as 42. The 2nd, `d[0]` is uninitialized. – chux - Reinstate Monica Nov 19 '18 at 21:23
  • 1
    @chux `return` is performed after `realloc`, so the compiler can do `d = malloc(2*n); *d = 42; return d;`. Because as we know that `var = expr; return var` is the same as `return expr;`, given the same types. – Andrew Svietlichnyy Nov 19 '18 at 23:00
11

A compiler which bundles its own self-contained versions of malloc/calloc/free/realloc could legitimately perform the indicated optimization if the authors thought doing so was worth the effort. A compiler that chains to externally-supplied functions could still perform such optimizations if it documented that it did not regard the precise sequence of calls to such functions as an observable side-effect, but such treatment could be a bit more tenuous.

If no storage is allocated or deallocated between the malloc() and realloc(), the size of the realloc() is known when the malloc() is performed, and the realloc() size is larger than the malloc() size, then it may make sense to consolidate the malloc() and realloc() operations into a single larger allocation. If the state of memory could change in the interim, however, then such an optimization might cause the failure of operations that should have succeeded. For example, given the sequence:

void *p1 = malloc(2000000000);
void *p2 = malloc(2);
free(p1);
p2 = realloc(p2, 2000000000);

a system might not have 2000000000 bytes available for p2 until after p1 is freed. If it were to change the code to:

void *p1 = malloc(2000000000);
void *p2 = malloc(2000000000);
free(p1);

that would result in the allocation of p2 failing. Because the Standard never guarantees that allocation requests will succeed, such behavior would not be non-conforming. On the other hand, the following would also be a "conforming" implementation:

void *malloc(size_t size) { return 0; }
void *calloc(size_t size, size_t count) { return 0; }
void free(void *p) {  }
void *realloc(void *p, size_t size) { return 0; }

Such an implementation might arguably be regarded as more "efficient" than most others, but one would have to be rather obtuse to regard it as being very useful except, perhaps, in rare situations where the above functions are are called on code paths that are never executed.

I think the Standard would clearly allow the optimization, at least in cases that are as simple as those in the original question. Even in cases where it might cause operations to fail that could otherwise have succeeded, the Standard would still allow it. Most likely, the reason that many compilers don't perform the optimization is that the authors didn't think the benefits would be sufficient to justify the effort required to identify cases where it would be safe and useful.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Standards wise, no. C99 and C11 explicitly state the old object is deallocated and a new object is returned. Even with a private allocator the compiler cannot predict at compile time the pointer to a new allocation. – Greg A. Woods Nov 19 '18 at 20:50
  • 1
    @GregA.Woods: Under the as-if rule, a compiler would be allowed to consolidate the operations if the resulting behavior could have resulted from doing the operations separately. By what standard-defined means could a program observe whether `realloc` actually did anything other than yield a pointer to an allocation that might have already been as big as requested? – supercat Nov 19 '18 at 21:12
  • 1
    I suppose if the compiler's own allocator was known to always initially allocate more, say ten times as much, space as is required, and if the compiler could predict at compile time that the the allocated object's desired size was *X*, and also predict that the new size passed to `realloc()` was less than *10X* then it could assume the object would not change location. However I don't know if the optimization would still be allowed by the current standard. Perhaps. – Greg A. Woods Nov 19 '18 at 21:48
4

The compiler is allowed to optimize out multiple calls to functions which are considered pure functions, i.e., functions that do not have any side-effects.

So the question is whether realloc() is a pure function or not.

The C11 Standard Committee Draft N1570 states this about the realloc function:

7.22.3.5 The realloc function ... 2. The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.

Returns 4. The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.

Notice that the compiler cannot predict the value of the pointer at compile time that will be returned from each call.

This means that realloc() cannot be considered a pure function, and multiple calls to it cannot be optimized out by the compiler.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
P.W
  • 26,289
  • 6
  • 39
  • 76
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/183970/discussion-on-answer-by-p-w-are-compilers-allowed-to-optimize-out-realloc). –  Nov 20 '18 at 16:11
1

But you're not checking the return value of the first malloc() which you're then using in the second realloc(). It could just as well be NULL.

How could the compiler optimize the two calls into a single one without making unwarranted assumptions about the return value of the first?

Then there is another possible scenario. FreeBSD used to have a realloc() which was basically malloc + memcpy + free the old pointer.

Suppose that there are only 230 bytes left of free memory. In that implementation, ptr = malloc(100) followed by realloc(ptr, 200) will fail, but a single malloc(200) will succeed.

  • You are right about the checking, but I have submitted at [least one example](https://godbolt.org/z/z4mGaT) in the comments which include checking the return value - it doesn't seem to make a difference. Actually, the compiler does make such assumptions sometimes which I could [demonstrate](https://godbolt.org/z/mFIrwX). – Julius Nov 20 '18 at 07:59
0

My understanding is that such an optimization might be forbidden (notably for the -indeed unlikely- case where the malloc succeeds but the following  realloc fails).

You could suppose that malloc and realloc always succeed (that is against the C11 standard, n1570; look also into my joke-implementation of malloc). In that hypothesis (stricto sensu wrong, but some Linux systems have memory overcommitment to give that illusion), if you use GCC, you might write your own GCC plugin to make such an optimization.

I am not sure it is worth spending a few weeks or months to code such a GCC plugin (in practice, you probably want it to handle sometimes some code between malloc and realloc, and then it is not that simple, since you have to characterize and detect what such in-between code is acceptable), but that choice is yours.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547