2

What is the difference between

int *p = malloc( h * w * sizeof(*p) );

and

int *p = malloc( sizeof (*p) * h * w );

when h and w are of type int?

Why is it more safe to set sizeof(*p) at the first than at the last place in malloc?


I understood already that the latter form is used to insure size_t math and that the int operands would be widened to size_t before the calculation is done to prevent signed integer overflow as said here and here, but I don't quite understand how it exactly works.

  • 1
    why speaking specifically about `size_of` and about `malloc` when the question is not at that level ? – bruno Aug 12 '20 at 15:03
  • @bruno The issue is closely related to be used with `malloc()`. Reference: [here](https://stackoverflow.com/a/41595104/12139179) and [here](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc/62351067#comment56158851_605856) – RobertS supports Monica Cellio Aug 12 '20 at 15:05
  • 2
    the problem is about overflow in a expression, whatever the use of the result and from where the numbers come from. *sizeof* and *malloc* are just an example of use, even a good one – bruno Aug 12 '20 at 15:07
  • @bruno My intention was it to bring it explicitly in the context of `malloc()` although it may be applied to other cases, too. – RobertS supports Monica Cellio Aug 12 '20 at 15:29
  • Ideally, the code should be made "safe" rather than "more safe", although controlling order of integer promotion is a consideration for both of those. – Ian Abbott Aug 12 '20 at 16:39
  • 3
    I like `sizeof` last for easy conversion to `calloc()`: `ptr = malloc(42 * sizeof *ptr);` ==> `ptr = calloc(42, sizeof *ptr);` – pmg Aug 12 '20 at 17:15

3 Answers3

4

When you write the sizeof operation first you usually insure that the calculation is done with at least size_t math. Let's determine what this means.


Problem with placing sizeof(X) at last:

Imagine the scenario, h has the value 200000 and w has the value 50000 (maybe got by accident).

Assuming that the maximum integer value an int can hold is 2147483647, which is common (The exact implementation-defined value you can read from the macro INT_MAX - header <limits.h>), Both are legit values an int can hold.

If you now use malloc( h * w * sizeof(*p) );, the part h * w is calculated first as the evaluation order of an arithmetic expression goes from left to right. With this, You would get a signed integer overflow as the result 10000000000 (10 billion) isn't possible to be represented by an int.

The behavior of a program in which integer overflow happens is undefined. The C standard states integer overflow even as example for undefined behavior at the specification of it:

3.4.3

1 undefined behavior

behavior, upon use of a non portable or erroneous program construct or of erroneous data, for which this document imposes no requirements

2 Note 1 to entry: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

3 Note 2 to entry: J.2 gives an overview over properties of C programs that lead to undefined behavior.

4 EXAMPLE An example of undefined behavior is the behavior on integer overflow.

Source: C18, §3.4.3


Placing sizeof(X) at the first place instead:

If you use the sizeof operation first instead, like malloc( sizeof(*p) * h * w );, you usually don't have the risk to get an integer overflow.

This because of two reasons.

  1. sizeof gains a value of the unsigned integer type size_t. size_t has on the most modern implementations a higher integer conversion rank and size than an int. Common values: sizeof(size_t) == 8 and sizeof(int) == 4.

    This is important for point 2., something called integer promotion (arithmetic conversion) which occurs in arithmetic expressions.

  2. In expression often happen automatic type conversions of operands. This is called integer or implicit type promotion. For more information about this you can take a look at this useful FAQ.

    For this promotion, the conversion rank of an integer type is important since the type of the operand of the less integer conversion rank get promoted to the type of the operand of the higher integer conversion rank.

    Looking at the exact phrases from the C standard:

    "Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank."

    Source: C18, §6.3.1.8/1

    Also conversions of the signedness can happen here and that's what is important in this case as described later on.

    "Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type."

    ....

    "Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type."

    Source: C18, §6.3.1.8/1

    If size_t has higher or at least equal integer conversion rank than int and int isn't be able to represent all values of size_t (which is fulfilled since int usually has less size than size_t as said earlier), the operands h and w of type int get promoted to type size_t before the calculation.


Importance of the signedness conversion to an unsigned integer:

Now you might ask: Why is the signedness conversion to an unsigned integer important?

There are also two reasons here by which the second is more important but for the sake of completeness I want to cover both.

  1. An unsigned integer has always a wider positive range than an signed integer of the the same integer conversion rank. This is because a signed integer also always need to represent a negative range of values. An unsigned integer has no negative range and can therefore represent almost twice more positive values than an signed integer.

    But even more important:

  2. An unsigned integer can never overflow!

"A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."

Source: C18, §6.2.5/9 (emphasize mine)

That's the reason why placing the sizeof operation first as in malloc( sizeof(*p) * h * w ); is more safe.

However, in the case you will get over the limit by using an unsigned integer, due to wrapping around the allocated memory would be too small to use it for the desired purpose. Accessing non-allocated memory would also invoke undefined behavior.

But nonetheless it gives you protection from getting undefined behavior at the call to malloc() itself.


Side notes:

  • Note that placing sizeof at the second position malloc( h * sizeof(*p) * w ) would technically achieve the same effect although it might decrease readability.

  • If the arithmetic expression in the call to malloc() only has one or 2 operands (e.g. sizeof(x) and an int) the order doesn't matter. But to stick to convention, I would recommend to use the same style by placing the sizeof() always first: malloc(sizeof(int) * 4). In this way you don't risk accidentally forgetting it by having 2 int operands.

  • Using an unsigned integer type like size_t for h and w can also be a smarter alternative. It insures that no undefined overflow will happen in the first place and beside that it's more appropriate as h and w aren't meant to have negative values.


Related:

  • 7
    I almost voted up, but I have to conclude this answer is overblown. First, the point that “An unsigned integer can never overflow” is overstated. It wraps instead of having undefined behavior, but that is of no avail to a programmer allocating memory, as the wrapping will result in a erroneously too small allocation, after which using the desired but unallocated memory will result in undefined behavior, so nothing has been achieved from the fact that unsigned arithmetic does not overflow. The getting the `size_t` type by putting `sizeof` first is that it is done in a type that commonly… – Eric Postpischil Aug 12 '20 at 15:05
  • 3
    … can handle arithmetic for all supported allocation sizes, not that it can never “overflow.” Second, the standard citations are excessive; we do not need lots of supporting material to understand and accept that signed integer arithmetic overflows have undefined behavior and so on. – Eric Postpischil Aug 12 '20 at 15:06
  • @EricPostpischil I took the first concern into it. Practically you are right, but I think it is maybe worth to at least give a little more protection. Given the interesting question: Does it matter what UB exactly causes? The placement of `sizeof` that way has also been already subject [here](https://stackoverflow.com/a/41595104/12139179), [here](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc/62351067#comment56158851_605856) and [here](https://stackoverflow.com/a/605858/12139179). This is also where I got my influence from to dedicate an own post to it. – RobertS supports Monica Cellio Aug 12 '20 at 15:28
  • 3
    *Using an unsigned integer type like size_t for h and w can also be a smarter alternative. It insures that no undefined overflow will happen in the first place and beside that it's more appropriate as h and w aren't meant to have negative values.* Not safe enough: `p = malloc(h * w * sizeof(*p));` with `unsigned int` `h` and `w` does not invoke undefined behavior but still produces an incorrect result if `h * w` exceeds `UINT_MAX`, instructing `malloc()` to allocate too little memory. – chqrlie Aug 12 '20 at 15:31
  • 1
    The resulting type of an expression does not depend on the ordering of its operand.`a*b` has the same resulting type aa `b*a` – wildplasser Aug 12 '20 at 21:52
3

Why is it more safe to set sizeof(*p) at the first than at the last place in malloc?

The short answer is: It isn't (or at least shouldn't be) more safe.

Longer answer:

Any integer calculation may overflow - some with undefined behavior as result - some with incorrect results and (maybe) subsequent program failure.

Any integer calculation must consider whether overflow can happen. If you write a program that allocates more than 2G of memory in a single malloc call, I'm sure you are already aware of that and have ensured that both h and w have appropriate types.

Further, the standard doesn't tell exactly what the maximum values for integer types are. So if you want to program "safe" make sure to ask about those limits at run time.

To put in another way: "more safe" is not a programming goal. If you write programs that operate on the edge, you make them safe - not just "more safe"

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • _"It isn't (or at least shouldn't be) more safe - Any integer calculation may overflow"_ pedantically speaking, the question didn't contain any statement about absolute safety. So a wider range can be **safer** even if it doesn't provide absolute safety. – Roberto Caboni Aug 12 '20 at 16:18
  • 3
    @RobertoCaboni That's not the point here. The point is whether you want to make "always pu `sizeof` a "good pratice rule". My point is that there is nothing to support that rule. If you are dealing with memory allocations of that size, you really have to program "more safe" than just putting `sizeof` first. IMO put `sizeof` first shall never be a rule. – Support Ukraine Aug 12 '20 at 16:22
  • 1
    I understand your point, but when you write _"If you are dealing with memory allocations of that size"_ you are assuming 32bit unsigned integer. But why excluding scenarios in which `sizeof (int) = 2` (I worked for years with such a platform)? My point is that there's nothing wrong in bearing in mind that `malloc (uint8_var1 * uint8_var2 * sizeof(x))` could lead to undesired results, even if `max(uint8_t)` is only 256. Placing `sizeof ()` as the first parameter in thi case would be safer. – Roberto Caboni Aug 12 '20 at 17:58
  • I think with your argumentation before it would make more sense to change "I'm sure you are already aware of that and have ensured that both `h` and `w` have appropriate *types*." to "I'm sure you are already aware of that and have ensured that both `h` and `w` have appropriate *values* (that overflow doesn't happen)." – RobertS supports Monica Cellio Aug 12 '20 at 18:01
  • "*Further, the standard doesn't tell exactly what the maximum values for integer types are. So if you want to program "safe" make sure to ask about those limits at run time.*" - The exact values are irrelevant as well as ask about them explicitly. What you mean I think is to ask for them by using the macros in `limits.h` **and** do some conditional checks before allocating the memory whether an overflow will happen or not. – RobertS supports Monica Cellio Aug 12 '20 at 18:07
  • "*To put in another way: "more safe" is not a programming goal. If you write programs that operate on the edge, you make them safe - not just "more safe"*" - I think I share the same opinion but nonetheless there are cases where you just can't insure complete safety but just have to implement one way because that is the task to do. Therefore, I argued with making it "*more safe*" than to make something "*safe*" which can't be accomplished. Just adding an extra layer of security. – RobertS supports Monica Cellio Aug 12 '20 at 18:13
  • @RobertSsupportsMonicaCellio "... but nonetheless there are cases where you just can't insure complete safety ..." Please present documentation for such a case, thanks – Support Ukraine Aug 12 '20 at 18:54
1

While multiplication is commutative, apparently compilers do not scan ahead for the largest type, sizeof() being size_t, which on 64 bit computers is unsigned long (2^64 - 1), so the order is important for preventing overflow, which in many languages happens silently, even though all CPUs provide that information as a status bit if not as an interrupt! Of course, some programmers want a silent overflow to get the reside mod type size, but that's a sad reason to make the rest of us suffer!

  • 2
    It's not really the compilers fault - the language requires this behavior due to the way the operators and the rules for their precedence and order of evaluation are defined. – Hulk Aug 12 '20 at 16:17
  • Also, this question is about C - it doesn't matter what other languages defined or might have defined based on the features of CPUs. C is what it is, and that includes UB and somewhat unintuitive type promotion rules. – Hulk Aug 12 '20 at 16:30
  • Commutativity is not the key point here, associativity is. Yet the C Standard defines the `*` operator as left-associative, giving a precise and compelling rule for the interpretation of `w * h * sizeof(*p)` for both the type and the value of the intermediary result. – chqrlie Aug 13 '20 at 20:25
  • C++ has expanded to support a lot of things that were errors, so there is hope for a newer version of the compiler that rearranges actions that are commutative in the math world so they are safest in the program world. And if you are casting the result to a size_t, all unsigned 64 bit math seems eminently sensible on a 64 bit compiler. – David G. Pickett Aug 18 '20 at 20:43