How to detect negative number assigned to size_t?

Question

This declaration compiles without warnings in g++ -pedantic -Wall (version 4.6.3):

std::size_t foo = -42;

Less visibly bogus is declaring a function with a size_t argument, and calling it with a negative value. Can such a function protect against an inadvertent negative argument (which appears as umpteen quintillion, obeying §4.7/2)?

Incomplete answers:

Just changing size_t to (signed) long discards the semantics and other advantages of size_t.

Changing it to ssize_t is merely POSIX, not Standard.

Changing it to ptrdiff_t is brittle and sometimes broken.

Testing for huge values (high-order bit set, etc) is arbitrary.

Related http://stackoverflow.com/questions/2711522/what-happens-if-i-assign-a-negative-value-to-an-unsigned-variable — halex, Mar 29 '13 at 19:31
+1 for "umpteen quintillion". Jk, +1 because it's a good question. — Nicu Stiurca, Mar 29 '13 at 19:34
Testing for high-order bit is not arbitrary, it is standard. — alecov, Jan 18 '16 at 17:38
IMO `unsigned` is evil. I've seen terrible bugs caused by something like `while(size_t index < container.size()-1)` triggering on empty `container`. I understand that `unsigned` types may be useful sometimes when available bits are few, but having a widely used general purpose integer type `size_t` being defined as `unsigned` rather than `int` was a huge horrible humongous mistake of epic proportions. Sorry for the rambling, it just happens to be a pet peeve of mine. `unsigned` ... urgh. — Michael, Jan 18 '16 at 18:47
@n.1.8e9-where's-my-sharem that's worth posting as an answer. — Camille Goudeseune, Sep 26 '22 at 19:22

Chris Dodd · Accepted Answer · 2022-09-24T19:46:08.140

4

The problem with issuing a warning for this is that it's not undefined behavior according to the standard. If you convert a signed value to an unsigned type of the same size (or larger), you can later convert that back to a signed value of the original signed type and get the original value¹ on any standards-compliant compiler.

In addition, using negative values converted to size_t is fairly common practice for various error conditions -- many system calls return an unsigned (size_t or off_t) value for success or a -1 (converted to unsigned) for an error. So adding such a warning to the compiler would cause spurious warnings for much existing code. POSIX attempts to codify this with ssize_t, but that breaks calls that may be successful with a return value greater than the maximum signed value for ssize_t.

¹_{"original value" here actually means "a bit pattern that compares as equal to the original bit pattern when compared as that signed type" -- padding bits might not be preserved, and if the signed representation has redundant encodings (eg, -0 and +0 in a sign-magnitude representation) it might be 'canonicalized'}

edited Sep 24 '22 at 19:46

answered Mar 29 '13 at 19:52

Chris Dodd

119,907
13
134
226

Runtime detection is a bigger issue here than compiler warnings. But you gave me the idea of, inside the function, casting the value back to signed and testing for <0. That works. So I'll call this answered. – Camille Goudeseune Mar 29 '13 at 20:44
@CamilleGoudeseune D'oh, converting back to signed and testing <0 is undefined behavior. Pick a maximum value, such as `LONG_MAX` or `std::numeric_limits< long >::max()`, and compare the unsigned value to that, with no typecasting. – Potatoswatter Mar 29 '13 at 22:59
@hvd: The whole point is that the spec does allow casting back and forth with same-size signed and unsigned types, and getting the same value, even though the ranges are different. – Chris Dodd Mar 29 '13 at 23:21
1

@hvd: But you're converting from a type with 65535 values to one with 65536 and then back to one with 65535. No problem there. – Chris Dodd Mar 30 '13 at 19:26
@ChrisDodd Oh, you're right, I misread what you were claiming. In that case, I must agree that you're correct that there is no technical reason why it would be impossible. Unfortunately, even if it is possible, it still isn't what the standard says. (Will comment in more detail in a bit.) – Mar 30 '13 at 19:30
@ChrisDodd [conv.integral] says "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type).", which requires `(unsigned)-1` to result in `UINT_MAX`. However, for converting that back to `int`, there is only "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined." Assuming `UINT_MAX > INT_MAX`, `(int)UINT_MAX` may give any value. – Mar 30 '13 at 19:36
@ChrisDodd To be clear, something like `unsigned u = -1; if (u == -2) abort();` is well-defined though. It's defined because `u` isn't converted to `int` when comparing, instead, `-2` is converted to `unsigned int`. – Mar 30 '13 at 19:40

score 2 · Answer 2 · answered Jan 18 '16 at 17:37

2

The following excerpt is from a private library.

#include <limits.h>

#if __STDC__ == 1 && __STDC_VERSION__ >= 199901L || \
    defined __GNUC__ || defined _MSC_VER
    /* Has long long. */
    #ifdef __GNUC__
        #define CORE_1ULL __extension__ 1ULL
    #else
        #define CORE_1ULL 1ULL
    #endif
    #define CORE_IS_POS(x) ((x) && ((x) & CORE_1ULL << (sizeof (x)*CHAR_BIT - 1)) == 0)
    #define CORE_IS_NEG(x) (((x) & CORE_1ULL << (sizeof (x)*CHAR_BIT - 1)) != 0)
#else
    #define CORE_IS_POS(x) ((x) && ((x) & 1UL << (sizeof (x)*CHAR_BIT - 1)) == 0)
    #define CORE_IS_NEG(x) (((x) & 1UL << (sizeof (x)*CHAR_BIT - 1)) != 0)
#endif

#define CORE_IS_ZPOS(x) (!(x) || CORE_IS_POS(x))
#define CORE_IS_ZNEG(x) (!(x) || CORE_IS_NEG(x))

This should work with all unsigned types.

answered Jan 18 '16 at 17:37

alecov

4,882
2
29
55

Nice use of CHAR_BIT, http://stackoverflow.com/a/3200969/2097284. But what implicit casting makes this work? `(x) & ...` casts x to ULL? `(x) &&` means `(x != 0) &&`? And ZPOS means `>= 0` ? – Camille Goudeseune Jan 19 '16 at 16:57
1

Yes, `IS_POS`/`IS_NEG` tests for `> 0` and `< 0`; and `IS_ZPOS`/`IS_ZNEG` tests for `>= 0` and `<= 0`. There's no cast involved; the macros merely test the highest order bit (the `(x) & ...` part) using the largest unsigned integer type available to construct the bitmask expression (hence UL and ULL). The `(x) && ...` part means simply `(x) != 0`. – alecov Jan 19 '16 at 17:33

How to detect negative number assigned to size_t?

2 Answers2