ISO/IEC 9899:1990 programming Language C definition about short int, int, long

Question

"ISO/IEC 9899:1990, Programming Languages - C (ISO C) left the definition of the short int, the int, the long int, and the pointer deliberately vague to avoid artificially constraining hardware architectures that might benefit from defining these data types independent from the other. The only constraints were that ints must be no smaller than shorts, and longs must be no smaller than ints, and size_t must represent the largest unsigned type supported by an implementation. It is possible, for instance, to define a short as 16 bits, an int as 32 bits, a long as 64 bits and a pointer as 128 bits. The relationship between the fundamental data types can be expressed as: sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) = sizeof(size_t)" http://www.unix.org/whitepapers/64bit.html

Why do we need to define these data types so vaguely?

Is that because we have different computer architectures so that we can't set the int to a fixed size of 32 bits?

And what's the difference between long and int64? Is it that the size of long will be determined by the system & int64 is guaranteed to be 64bit?

"size_t must represent the largest unsigned type supported by an implementation" --> Sure about that? — chux - Reinstate Monica, Jun 30 '20 at 02:58
What are you quoting from? Please attribute text taken from other sources. — Nate Eldredge, Jun 30 '20 at 03:00
Future proofing. When we're onto 1024 bit computing, what are languages where the size is codified going to do? `überlong`? `holyitsbig`? — user4581301, Jun 30 '20 at 03:01
`int64_t` did not appear until 99. There is no `int64` defined in C. — chux - Reinstate Monica, Jun 30 '20 at 03:01
"Is that because we have different computer architectures so that we can't set the int to a fixed size of 32bit?" - Yes. "And what's the difference between long and int64?" - There's no such thing as `int64` in the C or C++ standard. If you meant `int64_t`, then it is an optional type that is guaranteed to be exactly 64 bits while the size of `long` is implementation defined. — eesiraed, Jun 30 '20 at 03:01
The characterization of `size_t` is incorrect. The Cray machines used 64-bits for `char`, `short`, `int` and `long`, IIRC. At least, `char` was not an 8-bit quantity (and it was a 64-bit machine). Check out Wikipedia on [word sizes](https://en.wikipedia.org/wiki/Word_(computer_architecture)). Where does a 39-bit architecture fit into your planned scheme? — Jonathan Leffler, Jun 30 '20 at 03:04
Sorry, I forgot to add reference of that article.Here is the link http://www.unix.org/whitepapers/64bit.html — , Jun 30 '20 at 03:17

John Bollinger · Answer 1 · 2020-06-30T03:31:01.917

Why we need to define these data type so vague

According to your excerpt, the reason was

to avoid artificially constraining hardware architectures that might benefit from defining these data types independent from the other

I find that wording a little awkward, though. The basic idea is that the standard allows C implementations for different hardware architectures to choose sizes for the various types that are naturally suited to the hardware. This is not just about 32-bit vs. 64-bit, by the way. I have personally used 8-bit, 16-bit, 32-bit, and 64-bit computers, and I have worked on software that was originally written for computers with 36-bit and other native word sizes as well. And that's just what I can claim any kind of personal connection to. The past, present, and likely future diversity of computing hardware is much greater than I suspect you appreciate, but C can be implemented efficiently on a very wide variety of it.

And what's the difference between long and int64? Is that the size of long will be determined by system & int64 is guaranteed to be 64bit?

The C language does not define any type named int64. Especially C90, the version referenced by your excerpt, does not provide one. More recent versions of C define a type int64_t, which implementations are not required to provide. Where it is available, it is an integer type with exactly one sign bit, 63 value bits, and no padding bits, represented in two's-complement form. On some systems, long and int64_t are the same type, whereas on others, they are different types. On yet others, there is no int64_t. In Microsoft's C implementation, for example, long is a 32-bit type even on 64-bit hardware.

Thanks for your reply. Appreciate that. – Jun 30 '20 at 03:45 — , Jun 30 '20 at 03:45

score 1 · Answer 2 · answered Jun 30 '20 at 10:53

First of all, it must be noted that C was invented during a very early computer era, based on B and BCPL languages from the 1960s. Lots of different experimental computers existed back then - nobody quite knew which ones would survive or become industry standard.

Because of this, the C language even supports three different forms of signed number formats: 1's complement, 2's complement and signed magnitude. Where 1's complement and signed magnitude are allowed to come with exotic behavior such as trap representations or padding bits. But some 99.999% of all modern real-world computers use 2's complement, so all of this is very unhelpful.

Why we need to define these data type so vague

We don't. Not giving the integer types a fixed size and signedness was arguably a naive design mistake. The rationale back in the days was to allow C to run on as many different computers as possible. Which is as it turns out, not at all the same thing as porting C code between different computers.

Lazy programmers might find it handy to sloppily spam int everywhere without thinking about integer limits, then get a "suitable, large enough integer of the local signedness". But that's not in the slightest helpful when we for example need to use exactly 16 bits 2's complement. Or when we need to optimize for size. Or when we are using an 8 bit CPU and want to avoid anything larger than 8 bits whenever possible.

So int & friends are not quite portable: the size and signedness format is unknown and inconsistent across platforms, making these so-called "primitive data types" potentially dangerous and/or inefficient.

To make things worse, the unpredictable behavior of int collides with other language flaws like implicit int type promotion (see Implicit type promotion rules), or the fact that integer constants like 1 are always int. These rules were meant to turn every expression into int, to save incompetent programmers from themselves, in case they did arithmetic with overflow on small, signed integer types.

For example int8_t i8=0; ... i8 = i8 + 256; doesn't actually cause signed overflow in C, because the operation is carried out on type int, which is then converted back to the small integer type int8_t (although in an implementation-defined manner).

However, the implicit promotion rules always caused more harm than good. Your unsigned short may suddenly and silently turn into a signed int when ported from a 16 bit system to a 32 bit system. Which in turn can create all manner of subtle bugs, particularly when using bitwise operators/writing hardware-related code. And the rules create an inconsistency between how small integer types and large integer types work inside expressions.

To solve some of these problems, stdint.h was introduced in the language back in 1999. It contains types like uint8_t that are guaranteed to have a fixed size no matter system. And they are guaranteed to be 2's complement. In addition, we may use types like uint_fast8_t to let the compiler pick the fastest suitable type for a given system, portably. Most professional C software nowadays - embedded systems in particular - only ever use the stdint.h types and never the native types.

stdint.h makes it easier to port code, but it doesn't really solve the implicit promotion problems. To solve those, the language would have to be rewritten with a stronger type system and enforce that all integer converts have to be explicit with casts. Since there is no hope of C ever getting fixed, safe subsets of the language were developed, such as MISRA-C and CERT-C. A significant portion of these documents are dedicated to solving implicit conversion bugs.

A note about size_t specifically, it is guaranteed to be unsigned and "large enough", but that's about it. They didn't really give enough thought about defining what it's supposed to represent. The maximum size of an object? An array? Or just the type returned by sizeof? There's an unexpected dependency between it and ptrdiff_t - another language flaw - see this exotic problem I ran into when using size_t to represent the maximum allowed size of an array.

Thanks for your kind response. Very helpful! – Jul 01 '20 at 06:52 — , Jul 01 '20 at 06:52

ISO/IEC 9899:1990 programming Language C definition about short int, int, long

2 Answers2