5

Which if any of the following does "the right thing" in a standards-compliant manner? You can assume that m and n are of type int (signed integer). The main issue is signed integer overflow.

Sample 1.

size_t bytes = n * m;
if (n > 0 && m > 0 && SIZE_MAX/n >= m) {
    /* allocate “bytes” space */
}

Sample 2.

if (n > 0 && m > 0 && SIZE_MAX/n >= m) {
    size_t bytes = n * m;
    /* allocate “bytes” space */
}

Sample 3.

if (n > 0 && m > 0 && SIZE_MAX/n >= m) {
    size_t bytes = (size_t)n * (size_t)m;
    /* allocate “bytes” space */
}

I think they're all wrong, but not all for the same reason. So what would be correct?

These snippets are taken from here.


Edited to emphasise that the main issue is multiplying signed integers, which can lead to Undefined Behaviour (which unsigned does not).

I now think that the last sample works correctly provided that integer, signed integer, size_t and SIZE_MAX have the "usual" values, or at least that they comply with relevant standards.

david.pfx
  • 10,520
  • 3
  • 30
  • 63
  • 2
    The flaws of the first two methods are explained in the article that you linked to. Why do you think that method #3 is wrong? – Martin R Apr 19 '14 at 06:28
  • 2
    Does [this](http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflow-in-c-c) link help at all? – ultifinitus Apr 19 '14 at 06:35
  • 1
    @MartinR: Because the article lacks the kind of rigour I've come to expect from language lawyers, and I found I couldn't write a good enough answer myself. Sample 3 is trivially wrong, as per the article. Even corrected, I think it may have a subtle flaw. – david.pfx Apr 19 '14 at 06:49
  • @ultifinitus: Great article, but mostly dealing with unsigned arithmetic. I couldn't find a good answer to this question from it. – david.pfx Apr 19 '14 at 06:50
  • @david.pfx: I am probably overlooking something obvious, but I still cannot see why *"Sample 3 is trivially wrong"*. It is presented as "Code that works" in the article. – Martin R Apr 19 '14 at 06:52
  • The article itself says why option 3 is 'trivially wrong'. n is divided by SIZE_MAX and compared to m. While SIZE_MAX is the maximum value of size_t, the type of m and n is not specified. They can well be any type, e.g. unsigned, you can't be sure until casted on the multiply. In C, testing things of different types is a recipe for mistakes. Something I find curious is that Apple's article point to a vulnerability that talk about compilers making assumptions about pointers, when size_t is an integer. I still have to see a compiler that optimizes a valid integer test to make it a bug. – guilleamodeo Apr 19 '14 at 08:43
  • @guilleamodeo: m and n are (in this question) assumed to be a signed int. SIZE_T is a size_t which is an unsigned integer and may have the same size as int or larger size. As I understand it, all operands in the expression `SIZE_MAX/n >= m` are promoted to the larger type, which is size_t. - And (pardon my ignorance) I still cannot find where the article says that option 3 is trivially wrong. – Martin R Apr 19 '14 at 09:13
  • @guilleamodeo: Small correction: `size_t` may have a larger, the same, or a smaller size than `int`. But in any case I do not see how `if (n > 0 && m > 0 && SIZE_MAX/n >= m)` could give a wrong result. And **if** the condition is true, then `size_t bytes = (size_t)n * (size_t)m;` should give the correct result. – Martin R Apr 19 '14 at 09:25
  • @Martin R, I am not saying that it will give the wrong value, but that is wrong because it leads to undefined behavior. If you read the code snippets, only sample 1 shows the definition of m and n, and the others cast n and m to size_t in the multiply, thus suggesting they might not be size_t, so the test with SIZE_MAX happens before the cast. Actually, from the point of view of and old hag like me, Apple's article is wrong because says that a compiler will fail to honor a perfectly valid integer test, using a vulnerability on a pointer test to legit their claims. Comparing Apples to Oranges. – guilleamodeo Apr 19 '14 at 09:41
  • @MartinR: I may have misunderstood the relationship between the relative sizes of int, unsigned, size_t and SIZE_MAX, and I think there are combinations for which the compiler can discard the last part of the if. But I think you're right, the answer is still correct. – david.pfx Apr 19 '14 at 14:41
  • The most voted answer in [@ultifinitus's link](http://stackoverflow.com/a/199455/995714) already provide a quick answer for multiplication overflow. Just find the highest set bit, which is only an instruction on some architectures, and sum – phuclv Apr 20 '14 at 09:43
  • You should edit the title as there are many integer operations that can overflow such as addition, division... not only multiplication like your question's content. There are already many questions about multiplication overflow detection http://stackoverflow.com/questions/1815367/multiplication-of-large-numbers-how-to-catch-overflow – phuclv Apr 20 '14 at 09:47
  • @lu: Head Geeks's answer is clever, but the code is long and slow, it only works for unsigned, and it fails for some values. I would never use it. – david.pfx Apr 20 '14 at 10:46
  • @lu: Title: OK. But the link is to a Q&A that only work for unsigned. – david.pfx Apr 20 '14 at 10:47
  • @david.pfx The second answer in the link deals with signed integers: http://stackoverflow.com/a/1514309/1710392 – Étienne Apr 20 '14 at 11:03
  • 1
    @Étienne: Yes, it does but (a) it doesn't deal with multiplication explicitly and (b) it uses INT_MAX rather than SIZE_MAX as here, which is important. – david.pfx Apr 20 '14 at 14:00

2 Answers2

0

1 and 2 are wrong because n * m may overflow. The fact that you go on to assign it to a size_t doesn't "reverse" the overflow.

I think 3 is correct. (The cast on m is superfluous BTW). If anyone disagrees please post sample values for m and n and SIZE_MAX that makes the test incorrect!

Also I cannot see a problem with:

if ( m > 0 && n > 0 && (size_t)m * n / n == m )
    size_t bytes = (size_t)m * n;
M.M
  • 138,810
  • 21
  • 208
  • 365
  • In the last sample the compiler is free to simplify/optimise/reorder the `*/` calculation, taking advantage of the identity that `n/n==1`. Only the final result is reduced modulo max+1; intermediate results have no such requirement. The expression is trivially true. – david.pfx Apr 21 '14 at 05:42
  • 2
    @david, no it isn't. The result has to be generated "as-if" the calculation was done in the order specified. A compiler that "optimized" that away would be non-conforming. – M.M Apr 21 '14 at 06:41
  • That's not the wording in the standard, and the standard is not explicit about how modulo arithmetic would be applied in this case, especially if `n` has more bits than `size_t`. In practice I think you're right, but I'm still not so sure about the theory. – david.pfx Apr 21 '14 at 23:36
  • 1
    I'm assuming `sizeof(int) <= sizeof(size_t)`.Other than that I'm sure about the theory, it explicitly says that unsigned arithmetic is performed modulo `SIZE_MAX+1` etc. and 5.1.2.3 (C99) covers re-ordering of operations. Example 6 is relevant. – M.M Apr 21 '14 at 23:53
  • I prefer casting 'm' and 'n' to size_t for symmetry, but I agree that casting both is not necessary. It's a matter of taste. I'm the author of the blog post linked to in the question. http://randomascii.wordpress.com/2014/04/17/buggy-security-guidance-from-apple/ – Bruce Dawson Apr 29 '14 at 21:03
0

Sample 1 is incorrect, because the multiplication of m * n may exceed the range of representable values for the signed integer type, which leads to Undefined Behaviour. See n1570 S6.5/5:

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

This is of practical importance since one of the things the compiler is permitted to do is to assume that Undefined Behaviour has not happened, and it may use this to optimise the code so that the if statement disappears and is of no effect.

Note that this applies only to the signed type. Unsigned arithmetic does not overflow. See n1570 S6.2.5/9.

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

Sample 2 is incorrect, because the multiplication of m * n is evaluated as a signed integer expression, which may again result in Undefined Behaviour. However in this case there is less likelihood of a practical effect. The result will most likely produce the correct bit-wise result and once cast to an unsigned the value is likely to be correct. The sample is wrong, but better, at least for 32 bit processors.

Edit: The above para will be true if unsigned int and size_t are the same size, as they usually are for 32 bit processors. Sample 2 will not work correctly where size_t is larger than unsigned int, as it usually is on 64 bit processors. The signed multiply will overflow and produce wrong behaviour for some values.

Sample 3 is almost certainly correct. With the multiplication of m * n evaluated as an unsigned integer expression, there is no possibility of Undefined Behaviour. [As noted in another answer, only one cast operation is actually required.]

The exact behaviour here is subtly dependent on the definitions of int, unsigned int, size_t and SIZE_MAX. The usual assumption is that size_t is the same as unsigned int (or perhaps larger) and that SIZE_MAX is an unsigned integer literal, but these are not actually requirements. Indeed, SIZE_MAX is only required to be at least 65535. At least size_t is required to be unsigned.

Since I have not been able to find any plausible (or even implausible) definitions for them that would cause the sample to fail, on that basis I say that Sample 3 works correctly.

david.pfx
  • 10,520
  • 3
  • 30
  • 63
  • Sample 2 is *not* better. It is a complete disaster, at least when compiling for a 64-bit architecture. The most likely result is that the signed multiply will overflow and the high 32 bits will be discarded. When compiled for 64-bit the SIZE_MAX test will *always* pass and the code offers zero protection (well, it protects against non-positive, but that's it). Undefined behavior isn't the only problem. Correct behavior also matters, and sample 2 is incorrect for 64-bit architectures. See this post for details: http://randomascii.wordpress.com/2014/04/17/buggy-security-guidance-from-apple/ – Bruce Dawson Apr 29 '14 at 20:59
  • If `size_t` is `unsigned short`, sample 3 may fail. For example if `size_t` is 16 bits and `int` is 32 bits, then `size_t(65535) * size_t(65535)` causes undefined behavior. I don't know of a platform in which `size_t` is not of rank at least `unsigned int`, though. – Myria Feb 22 '16 at 19:41
  • @Myria: if `size_t` is 16 bits `unsigned short` then `SIZE_MAX` is 65535, per n1570 S7.20.3, the `if()` expression will evaluate to false and the multiplication will not be performed. – david.pfx Feb 23 '16 at 01:07