6

In the late draft of C11 [C11_N1570] and C17 [C17_N2176] I fail to find the proof of the following (which, I believe, is commonly known):
sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
Can anybody refer me to the particular sections?

I'm aware of this reply for C++11. The second part of the reply talks about C, but only touches the ranges of the values. It does not prove the ratio between the type sizes.

Robin Kuzmin
  • 742
  • 4
  • 12
  • 2
    The range of values should be sufficient to draw the conclusion about respective sizes for the types, is not it? – Sourav Ghosh May 21 '21 at 05:27
  • §5.2.4.2... The ranges *dictate* the size but only as it relates to a particular ISA. Everything else is subjective. Since the ranges are set, the sizes are consequently set. – tadman May 21 '21 at 05:28
  • 1
    @tadman: That specifies the ranges. But could we have a DeathStation 9000 implementation where `USHRT_MAX` is 65536, `UINT_MAX` is `4294967296`, but `sizeof(unsigned int) == 4` while `sizeof(unsigned short) = 27` due to a ton of padding bits? It would be incredibly dumb but I'm not sure it's illegal. – Nate Eldredge May 21 '21 at 05:32
  • @tadman: I believe the range of values and type size are unrelated (until the opposite is proved). A type can be bigger (8 bytes), but have smaller range of values (occupying 2 least significant bytes only). The other type can be smaller - 4 bytes - but have larger range of values - occupying all 4 bytes. – Robin Kuzmin May 21 '21 at 05:38
  • If you can fit 65,536 distinct values into an 8 bit register then you know something I don't. – tadman May 21 '21 at 05:41
  • What this is saying is you can have them all the same size if you want to be super lazy, or your ISA constrains you as such, but you're also free to choose the size on your ISA that best fits the required ranges. Nothing here prevents you from implementing C on some wonky system with, say, 18 bit words. – tadman May 21 '21 at 05:42
  • @tadman from DR260 Committee Response follows that a bit pattern in object representation doesn't uniquely determine the value. Different values may be represented by the same bit pattern. So I think an implementation with `CHAR_BIT` == 8 and `sizeof(int)` == 1 is possible. – Language Lawyer May 21 '21 at 06:34
  • @LanguageLawyer That will only allow for 2^8 possible values, which for any normal value of 2 leads to an outcome of 256. Not sure how you can cram 65,535 possible values in that space short of quantum-mechanical magic. – tadman May 21 '21 at 06:43
  • The standard does not provide a guaranteed size for `int` or `size_t`, etc.. It provides a minimum for the type. The remainder is left to the implementation. – David C. Rankin May 21 '21 at 06:54
  • 2
    I think that the standard only guarantees that `sizeof(X) >= 1` and minimal number of non-padding bits for a given type. Theoretically, it should be possible to have 4 byte-long `short` with 16 padding bits and 2-byte long `int` with no padding. It would be absurd though compliant with C standard. – tstanisl May 21 '21 at 08:50
  • @tadman different values don't have to be represented by different bit patterns. Thus, there can be more values than possible bit patterns. – Language Lawyer May 21 '21 at 09:22
  • @LanguageLawyer Regardless of the storage mechanism, binary or otherwise, the size rules still apply. I'm not sure why you're getting so extraordinarily pedantic here. If you're suggesting that you don't need a unique "pattern" per number, I have no idea what you're suggesting. – tadman May 21 '21 at 09:37
  • @LanguageLawyer: Re “different values don't have to be represented by different bit patterns”: What? In one given type? C 2018 3.19 defines “value” as “precise meaning of the contents of an object when interpreted as having a specific type”. The contents of an object are its bits. Give any one bit pattern in any one type, there can be only one value for it, that being the “precise meaning of the contents of an object when interpreted as having a specific type”. (There can be less than one value, as it can be a trap representation, but there cannot be more than one.) Show an example. – Eric Postpischil May 21 '21 at 10:32
  • This is not binding, but I will point out the title of the clause about `` is “Sizes of integer types ”, thus showing the intent is to describe the sizes of the types, not merely their ranks or capacities. – Eric Postpischil May 21 '21 at 10:37
  • @EricPostpischil IIUC when there is contradiction between the standard and a committee response (CR), committee response is chosen by implementors. An example is UB from reading an indeterminate value: the standard says that it is UB only if ind. value happens to be a trap representation, a CR says that ind. value is a notional value and reading it is always UB (unless through a char type etc.). AFAIR Clang and GCC sanitizers act according to the CR. – Language Lawyer May 21 '21 at 13:29
  • @LanguageLawyer: The fact that a “value” is indeterminate does not mean whatever bits are in the memory normally reserved for it represent multiple values. The lack of having a determinate value arises out of relieving the compiler of the need to read memory that has not been initialized, so it might use whatever is in some register or other cached location (and later be subject to further optimization), thus getting bits due to happenstance and using whatever value those bits represent—it does not arise from any interpretation of specific bits to mean different values. – Eric Postpischil May 21 '21 at 14:20
  • @EricPostpischil _The fact that a “value” is indeterminate does not mean whatever bits are in the memory normally reserved for it represent multiple values_ Not sure I understand what is written here. DR260CR says «Values may have any bit-pattern that validly represents them... In the case of an indeterminate value all bit-patterns are valid representations ...». Since usually some of these bit-patterns may also represent some other value (not indeterminate), a bit-pattern+type doesn't uniquely identify a value. – Language Lawyer May 24 '21 at 05:52
  • @EricPostpischil I didn't claim that an object has multiple values **at the same time** because the bit-pattern inside it can represent more than one value. – Language Lawyer May 24 '21 at 05:53
  • @LanguageLawyer Will you stipulate that, _if an object's value is determinate_, then the pattern of its value bits must represent _at most_ one mathematical value? (Some determinate objects may have a pattern of value bits that does not correspond to any value of the type, e.g. `_Bool x; memset(&x, 0xAA, sizeof x);` probably produces such an object.) – zwol May 24 '21 at 20:44
  • @LanguageLawyer: The language in that defect report is an abuse of English that they need to straighten out before putting it into the standard. To refer to something as an “indeterminate value” or an “unspecified value” is, expressed properly, a description of a state in which the value is not defined and we are only using the phrasing to describe how the program may act, not to specify what the value is. By analogy, when we speak of “the sum of an infinite series,” it is just a shorthand for a formal definition in which there is no actual infinity, just a limit we can prove the series… – Eric Postpischil May 25 '21 at 01:30
  • … approaches as the number of terms summed increases. When they put it into the standard, they ought to define it similarly. In the meantime, that committee response is not in the standard. Further, it is not relevant here, as neither indeterminate values nor unspecified values are at issue. The discussion is purely about normal values represented by defined bits, and, in this normal situation, there is no way to get more than 2^n values from n bits or for any one bit pattern to represent multiple values… – Eric Postpischil May 25 '21 at 01:31
  • … And no, it would not be possible for a conforming C implementation to have both `CHAR_BIT` equal to 8 and `sizeof (int)` to be 1. Those figurative/notional “indeterminate values” or “unspecified values” cannot be used to provide the required normal `int` values from −32,767 to +32,767. Talk of multiple values with one bit pattern is rubbish, a distracting sideshow that does not contribute to the conversation. – Eric Postpischil May 25 '21 at 01:35

4 Answers4

2

Thank you very much everybody who participated in the search of the answer. Most of the replies have shared what I have already learned, but some of the comments provided very interesting insight.
Below I will summarize what I learned so far (for my own future reference).


Conclusion

Looks like C (as of late draft of C17 [C17_N2176]) does not guarantee that
sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
(as opposed to C++).

What is Guaranteed

Below is my own interpretation/summary of what C does guarantee regarding the integer types (sorry if my terminology is not strict enough).

Multiple Aliases For the Same Type

This topic moves out of my way the multiple aliases for the same type ([C17_N2176], 6.2.5/4 parenthesized sentence referring to 6.7.2/2, thanks @M.M for the reference).

The Number of Bits in a Byte

The number of bits in a byte is implementation-specific and is >= 8. It is determined by CHAR_BIT identifier.
5.2.4.2.1/1 Sizes of integer types <limits.h>

Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

The text below assumes that the byte is 8 bits (keep that in mind on the implementations where byte has a different number of bits).

The sizeof([[un]signed] char)

sizeof(char), sizeof(unsigned char), sizeof(signed char), are 1 byte.
6.5.3.4/2 The sizeof and _Alignof operators

The sizeof operator yields the size (in bytes) of its operand

6.5.3.4/4:

When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

The Range of the Values and the Size of the Type

Objects may use not all the bits to store a value
The object representation has value bits, may have padding bits, and for the signed types has exactly one sign bit (6.2.6.2/1/2 Integer types). E.g. the variable can have size of 4 bytes, but only the 2 least significant bytes may be used to store a value (the object representation has only 16 value bits), similar to how the bool type has at least 1 value bit, and all other bits are padding bits.

The correspondence between the range of the values and the size of the type (or the number of value bits) is arguable.
On the one hand @eric-postpischil refers to 3.19/1:

value
precise meaning of the contents of an object when interpreted as having a specific type

This makes an impression that every value has a unique bit representation (bit pattern).

On the other hand @language-lawyer states

different values don't have to be represented by different bit patterns. Thus, there can be more values than possible bit patterns.

when there is contradiction between the standard and a committee response (CR), committee response is chosen by implementors.

from DR260 Committee Response follows that a bit pattern in object representation doesn't uniquely determine the value. Different values may be represented by the same bit pattern. So I think an implementation with CHAR_BIT == 8 and sizeof(int) == 1 is possible.

I didn't claim that an object has multiple values at the same time

@language-lawyer's statements make an impression that multiple values (e.g. 5, 23, -1), probably at different moments of time, can correspond to the same bit pattern (e.g. 0xFFFF) of the value bits of a variable. If that's true, then the integer types other than [[un]signed] char (see "The sizeof([[un]signed] char)" section above) can have any byte size >= 1 (they must have at least one value bit, which prevents them from having byte size 0 (paranoidly strictly speaking), which results in a size of at least one byte), and the whole range of values (mandated by <limits.h>, see below) can correspond to that "at least one value bit".

To summarize, the relation between sizeof(short), sizeof(int), sizeof(long), sizeof(long long) can be any
(any of these, in byte size, can be greater than or equal to any of the others. Again, somewhat paranoidly strictly speaking).

What Does Not Seem Arguable
What has not been mentioned is 6.2.6.2/1/2 Integer types:

For unsigned integer types .. If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N-1), so that objects of that type shall be capable of representing values from 0 to 2^N - 1 using a pure binary representation ..

For signed integer types .. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type ..

This makes me believe that each value bit adds a unique value to the overall value of the object. E.g. the least significant value bit (I'll call it a value bit number 0) (regardless of where in the byte(s) it is located) adds a value of 2^0 == 1, and no any other value bit adds that value, i.e. the value is added uniquely. The value bit number 1 (again, regardless of its position in the byte(s), but position different from the position of any other value bit) uniquely adds a value of 2^1 == 2.
These two value bits together sum up to the overall absolute value of 1 + 2 == 3.

Here I won't dig into whether they add a value when set to 1 or when cleared to 0 or combination of those. In the text below I assume that they add value if set to 1.

Just in case I'll also quote 6.2.6.2/2 Integer types:

If the sign bit is one, the value shall be modified in one of the following ways:
...
— the sign bit has the value -(2^M) (two’s complement);

Earlier in 6.2.6.2/2 it has been mentioned that M is the number of value bits in the signed type.
Thus, if we are talking about 8-bit signed value with 7 value bits and 1 sign bit, then the sign bit, if set to 1, adds the value of -(2^M) == -(2^7) == -128.

Earlier I considered an example where the two least significant value bits sum up to the overall absolute value of 3. Together with the sign bit set to 1 for the 8-bit signed value with 7 value bits, the overall signed value will be -128 + 3 == -125.
As an example, that value can have a bit pattern of 0x83 (the sign bit is set to 1 (0x80), the two least significant value bits are set to 1 (0x03), and both value bits add to the overall value if are set to 1, rather than cleared to 0, in the two's complement representation).

This observation makes me think that, very likely, there is a one-to-one correspondence between the range of values and the number of value bits in an object - every value has a unique pattern of value bits and every pattern of value bits uniquely maps to a single value.
(I realize that this intermediate conclusion can still be not strict enough or wrong or not cover certain cases)

Minimum Number of Value Bits and Bytes

5.2.4.2.1/1 Sizes of integer types <limits.h>:
Important sentence:

Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

Then:

SHRT_MIN -32767 // -(2^15 - 1)
SHRT_MAX +32767 // 2^15 - 1
USHRT_MAX 65535 // 2^16 - 1

This tells me that
short int has at least 15 value bits (see SHRT_MIN, SHRT_MAX above), i.e. at least 2 bytes (if byte is 8 bits, see "The Number of Bits in a Byte" above).
unsigned short int has at least 16 value bits (USHRT_MAX above), i.e. at least 2 bytes.

Continuing that logic (see 5.2.4.2.1/1):
int has at least 15 value bits (see INT_MIN, INT_MAX), i.e. at least 2 bytes.
unsigned int has at least 16 value bits (see UINT_MAX), i.e. at least 2 bytes.
long int has at least 31 value bits (see LONG_MIN, LONG_MAX), i.e. at least 4 bytes.
unsigned long int has at least 32 value bits (see ULONG_MAX), i.e. at least 4 bytes.
long long int has at least 63 value bits (see LLONG_MIN, LLONG_MAX), i.e. at least 8 bytes.
unsigned long long int has at least 64 value bits (see ULLONG_MAX), i.e. at least 8 bytes.

This proves to me that:
1 == sizeof(char) < any of { sizeof(short), sizeof(int), sizeof(long), sizeof(long long) }.

The sizeof(int)

6.2.5/5 Types

A "plain" int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range INT_MIN to INT_MAX as defined in the header <limits.h>).

This proves to me that:
sizeof(int) == 4 on 32-bit architecture (if byte is 8 bits),
sizeof(int) == 8 on 64-bit architecture (if byte is 8 bits).

The sizeof(unsigned T)

6.2.5/6 Types

For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.

This proves to me that:
sizeof(unsigned T) == sizoef(signed T).

The Ranges of Values

6.2.5/8 Types

For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.

(See the discussion of 6.3.1.1 below)
I assume that a subrange of values can contain the same or smaller number of values than the range. I.e. the type with the smaller conversion rank can have the same or smaller number of values than the type with the greater conversion rank.

6.3.1.1/1 Boolean, characters, and integers

— The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
— The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
— The rank of _Bool shall be less than the rank of all other standard integer types.
— The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2).

This tells me that:
range_of_values(bool) <= range_of_values(signed char) <= range_of_values(short int) <= range_of_values(int) <= range_of_values(long int) <= range_of_values(long long int).
For the unsigned types the relation between the ranges of values is the same.

This establishes the same relation for the number of value bits in the types.

But still does not prove the same relation between the sizes in bytes of objects of those types.
I.e. C (as of [C17_N2176]) does not guarantee the following statement (as opposed to C++):
sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

Robin Kuzmin
  • 742
  • 4
  • 12
  • Conversion rank determinates the sizes. Sure, nothing stops a dysfunctional compiler from making lets say short 100 bytes large but only utilize 2 of those. But why would anyone do that just for the sake of it? Why wouldn't range correspond to size in bytes? There's no requirement in the C standard that compiler vendors must use common sense, but just because of that we can't assume that the compiler is designed without common sense. – Lundin May 27 '21 at 07:39
  • @Lundin, > "_Conversion rank determinates the sizes_". Can you please refer me to the particular sections of the standard that confirm that? – Robin Kuzmin May 28 '21 at 00:14
  • I already did that in my answer. "the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type". So for example the values of `short` is a subrange of the values of `int`. This doesn't prevent the implementation from making short 666 bytes large but only using 2, but why would you do that? We have to assume that the compiler implementers are capable of using common sense. Discussions about padding bits are mostly nonsensical as well since all sane, modern systems use 2's complement. – Lundin May 28 '21 at 06:26
1

6.2.6.2 Integer Types starts by defining value and padding bits for unsigned subtypes (except for unsigned char).

Of padding bits not much is said except that there don't have to be any at all. But there can be more than one, unlike the sign bit for signed types.


There is no common-sense rule against over-padding a short until it gets longer than a long, whether long has more value bits or not.


The direct implicit relation between number of (value) bits and the maximum value also shows in the title 5.2.4.2.1 Sizes of integer types <limits.h>. This defines minimum maximum values, not object sizes (except with CHAR_BIT).

The rest lies in the names themselves and in the hands of the implementation: short and long, not small and large. It is nicer to say "I am a space saving integer" than "I am a integer with reduced maximum value".

-1

From initial examination of ISO/IEC 9899:2017 (your C17 C17_N2176 link):

  1. Section "5.2.4.2.1 Sizes of integer types <limits.h>" has ranges with (+ or -)(2 raised to n) - 1 information (which indicates the number of bits for the type).
  2. Section "6.2.5 Types" point 5 says '... A “plain” int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range INT_MIN to INT_MAX as defined in the header <limits.h>).'

This makes me think the ranges specify the smallest size in bits that the type can be. Maybe some architectures allot sizes greater than this smallest size.

Dharman
  • 30,962
  • 25
  • 85
  • 135
-2

The relevant parts are:

The environmental limits and limits.h, from C17 5.2.4.2.1 "Sizes of integer types <limits.h>". If we look at unsigned types only, then the minium values the implementation at least needs to support are:

UCHAR_MAX 255
USHRT_MAX 65535
UINT_MAX 65535
ULONG_MAX 4294967295
ULLONG_MAX 18446744073709551615

Then check the part C17 6.3.1.1 regarding integer conversion rank (also see Implicit type promotion rules):

  • The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
  • The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.

And then finally C17 6.2.5/8 is the normative text stating that every type with lower conversion rank is a subset of the larger ranked types:

For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.

To satisfy this requirement, we must have:

sizeof(unsigned char) <= 
sizeof(unsigned short) <= 
sizeof(unsigned int) <= 
sizeof(unsigned long) <= 
sizeof(unsigned long long)
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 2
    The question asks about sizes, and nothing in this answer proves any relationship between the sizes. `unsigned short` could have 16 value bits and 48 padding bits while `unsigned int` has 32 value bits and no padding bits. This satisfies the rank and range requirements but does not produce `sizeof (unsigned short)` ≤ `sizeof (unsigned int)`. Further, the question asks for proof. – Eric Postpischil May 21 '21 at 14:23