17

Does each of C and C++ standards allow sizeof of numeric types not to be a power of two?

The following constraints are known:

  • 16 <= CHAR_BIT * sizeof(int) <= CHAR_BIT * sizeof(long)
  • 32 <= CHAR_BIT * sizeof(long) <= CHAR_BIT * sizeof(long long)
  • and a dozen of others, which on a typical 8-bit byte architecture means 2 <= sizeof(int) && 4 <= sizeof(long)

Does that mean that sizeof(int) == 3 && sizeof(long) == 5 is a valid behaviour?

If yes - is there any known compiler/architecture behaving in a similar way?

sasha.sochka
  • 14,395
  • 10
  • 44
  • 68
  • 8
    Why woul dyo need to *check* any of this? Just write portable code that doesn't depend on the type widths... – Kerrek SB Jul 24 '13 at 13:03
  • 2
    Where does `sizeof(int) <= 4` come from? And on an architecture where `CHAR_BIT == 32`, I can't imagine `sizeof(int) == 1` being invalid. – Angew is no longer proud of SO Jul 24 '13 at 13:04
  • 1
    Still, where does it say `CHAR_BIT * sizeof(int) <= 32`? – Angew is no longer proud of SO Jul 24 '13 at 13:10
  • There certainly are 64-bit architectures which don't have smaller addressible types; so they couldn't obey your `<= 32` constraint. – Mike Seymour Jul 24 '13 at 13:11
  • 3
    There are 24-bit DSP architectures; however, I don't know if any of those have addressable 8-bit bytes (which would give `sizeof(int)==3`). The only one I've worked with had `CHAR_BIT==24` and `sizeof(int)==1`. – Mike Seymour Jul 24 '13 at 13:16
  • 1
    @MikeSeymour all 24-bit DSPs I've heard of also have `CHAR_BIT == 24` and `sizeof(int) == sizeof(char)` but recently I found out that the Motorola DSP5600x/3xx series has 16-bit short and 24-bit int. It also have 32-bit long if running in 16-bit mode – phuclv Feb 06 '15 at 07:42

4 Answers4

14

I think 3.9.1/2 (C++98) sums this up nicely (immediately followed by analogous information for the unsigned types):

There are four signed integer types: “signed char”, “short int”, “int”, and “long int.” In this list, each type provides at least as much storage as those preceding it in the list. Plain ints have the natural size suggested by the architecture of the execution environment39) ; the other signed integer types are provided to meet special needs.

Basically all we know is that sizeof(char) == 1 and that each "larger" type is at least that large, with int being a "natural" size for an architecture (where as far as I can tell "natural" is up to the compiler writer). We don't know anything like CHAR_BIT * sizeof(int) <= 32 etc. Also keep in mind that CHAR_BIT doesn't have to be 8 either.

It seems fairly safe to say that three byte int and five byte long would be allowed for hardware where those sizes were natively used. I am however not aware of any such hardware/architectures.

EDIT: As pointed out in @Nigel Harper comment we do know that int has to be at least 16 bits and long at least 32 bits to satisfy range requirements. Otherwise we don't have any specific size restrictions other than as seen above.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • 5
    Some Unisys mainframes have `sizeof(int) == 6` (and others have `CHAR_BIT == 9`). (The old CDC mainframes had 60 bit words; a C implementation would probably have had `CHAR_BIT == 10` and `sizeof(int) == 6`. But I don't think there was ever a C compiler for them, much less C++; it went out of production some time in the 1970s.) – James Kanze Jul 24 '13 at 13:23
  • We do know 16 <= CHAR_BIT * sizeof(int) - it's saying int is at least a 16 bit type which it has to be to satisfy the minimum range the standard dictates. Ditto 32 <= sizeof(long) * CHAR_BIT. – Nigel Harper Jul 24 '13 at 13:53
  • 1
    @JamesKanze, the CDC architecture is the only one I've seen where the size of an int (60 bits) is larger than the size of a pointer (18 bits). Also using the native character set a character was 6 bits, but they weren't individually addressable. – Mark Ransom Jul 24 '13 at 14:38
  • That's 3.9.1/2 of the C++ standard, right? It's important to specify that, particularly since the question is tagged both C and C++. – Keith Thompson Jul 24 '13 at 14:45
  • @MarkRansom Yes, but an implementation of C with 6 bit characters wouldn't be legal. And C also requires that the number of `char` in an `int` be a whole number (the usual PDP-10 solution of putting 5 seven bit `char` in a 36 bit word wouldn't be legal), so that forces us up to 10. The instruction set apparently assumes that bytes are 12 bits, so maybe 12 bit `char` and `sizeof(int) == 5` would have been more appropriate. (But I'm only guessing. I never actually programmed in assembler on the beast.) – James Kanze Jul 24 '13 at 14:57
11

TL;DR

The behavior is valid and such compilers/architectures do exist

  • TI C5500/C6000 with 4-byte int, 5-byte long
  • Motorola DSP5600x/3xx series with 2-byte short, 3-byte int, 6-byte long
  • x86 with 8-byte double, 10-byte long double

The number of bits used to represent the type long is not always the same as, or an integer multiple of, the number of bits in the type int. The ability to represent a greater range of values (than is possible in the type int) may be required, but processor costs may also be a consideration...

Derek M. Jones' The New C Standard (Excerpted material) - An Economic and Cultural Commentary


The other answer have already recapped C++ standard requirements. Similarly C standard also doesn't constrain the type (floating-point or integer) sizes in bytes to powers of 2. The most common example is long double, which is most often 10 bytes in x86 (with padding to 12 or 16 bytes in many modern compilers).

ISO/IEC 9899:1999 (E)

5.2.4.2.1 Sizes of integer types <limits.h>

  1. The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign. [...]

6.2.5 Types

  1. There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types.28)

    The standard and extended signed integer types are collectively called signed integer types.29)

  2. For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.


Odd-sized integer types are much rarer, but still exist. Many DSPs have standard-conforming compilers with non-power-or-2 types where int has 32 bits, long has 40 bits.

long is

  • 40 bits or 5 bytes for C6000 COFF. This is fully compliant with any major C/C++ standard as those standards are all defining a minimum requirement of 4 byte for long (aka. long int). Programmers are often falsely assuming this type having a size of exactly 4 bytes.

Emphasis mine
C89 Support in TI Compilers#Misunderstandings about TI C

Offside note: On some TI targets even long long is also a 32 or 40-bit type, which is valid in C89 as an extension but violates C99

Some targets have long long (an extension from C99), but not a conforming one. C99 requires at least 64 bits, but C2700 has 32-bit long long, and C5500 has 40-bit long long. C2800, C6000, and ARM have 64-bit long long, and C5400 and MSP430 do not support long long. This is not technically a violation of C89, since this is actually an extension, but if we start supporting C99, this would be a violation of C99 (C99 5.2.4.2.1 "Sizes of integer types <limits.h>" para 1).

The wider type's size doesn't even have to be a multiple of its preceding type's size. Continuing with what Derek M. Jones said in The New C Standard (Excerpted material): An Economic and Cultural Commentary

... For instance, the Texas Instruments TMS320C6000, a DSP processor, uses 32 bits to represent the type int and 40 bits to represent the type long (this choice is not uncommon). Those processors (usually DSP) that use 24 bits to represent the type int, often use 48 bits to represent the type long. The use of 24/48 bit integer type representations can be driven by application requirements where a 32/64-bit integer type representation are not cost effective.

In all 24-bit DSPs I had known before, CHAR_BIT == 24 and all types have sizes as multiples of 24 bits, but I've just found out that the Motorola DSP5600x/3xx series have a really "strange" type system

Data Type size in bits
(un)signed char 8
(un)signed short 16
(un)signed int 24
(un)signed long 48
(long)_fract 24 (48)
pointer 16/24
float/double 24+8
enum 24

So in this case sizeof(char) == 1 and sizeof(short) == 2 but sizeof(int) == 3 and sizeof(long) == 6

Unfortunately GCC calls them (long and long long) double-word integers, and so do most people, making a big misunderstanding, although it doesn't necessarily be double the size.

phuclv
  • 37,963
  • 15
  • 156
  • 475
1

The C++ standard (and almost certainly the C standard, but I haven't looked at it for a very long time) does not have a rule that says anything about the NUMBER of bits that a type should be. I know for a fact that 9-bit char is allowed, and there are machines with 36-bit integers. Last time I checked, neither 9 or 36 are powers of 2.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 3
    But `sizeof(int)` would be 4 on that platform, which is a power of 2. I'm not quite sure if the OP meant that `sizeof(x)` must be a power of two, or `sizeof(x) * CHAR_BIT`. – Sebastian Redl Jul 24 '13 at 13:34
  • 1
    @SebastianRedl, I meant `sizeof(type)`, not `sizeof(type) * CHAR_BIT` – sasha.sochka Jul 24 '13 at 13:48
  • 2
    Still, there's nothing in the standard that forbids either. It may be PRACTICAL to use powers of two when building hardware, but that's a different matter. – Mats Petersson Jul 24 '13 at 13:49
  • 1
    Anyway, if you include floating point types in numeric types, there's a simple counterexample even on x86: sizeof(long double) == 10 in MSVC I think. Unless it's overaligned, but I don't think it is. – Sebastian Redl Jul 24 '13 at 14:10
  • @SebastianRedl on MSVC `sizeof(long double)==8`, see [here](https://gcc.godbolt.org/z/OM5sPb). They don't support 80-bit long-doubles at all. Your example is right though for GCC, where for 32-bit x86 `sizeof(long double)==10`. – Ruslan Sep 12 '19 at 14:19
1

Definitely there are platforms with 24-bit ints. This is still used today for certain embedded applications. You could check Wikipedia for further information: http://en.wikipedia.org/wiki/24-bit

Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
  • 24-bit int does not mean that sizeof(int) is 3. The OP is asking about odd **sizeof**, not odd size in bits – phuclv Oct 11 '13 at 09:10
  • Not sure what you mean. To me it seems perfectly logical that 24-bit int gives sizeof(int) == 3 bytes. Why would or should it be anything else? – Johan Kotlinski Oct 11 '13 at 10:26
  • byte is [not always 8 bit](http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char), that's why C/C++ defined CHAR_BIT. There are computers with 9, 12, 16, 18, 24, 32-bit or even odder char, in that case obviously 24-bit int size won't be 3 – phuclv Oct 11 '13 at 13:25
  • I think for current day it can be assumed that byte is 8 bit unless stated otherwise – Johan Kotlinski Oct 12 '13 at 09:14
  • 1
    No. There are lots of DSP architectures that have 16, 24 or 32 bit char. Never safe to assume anything. There are hundreds of question about this on stackoverflow, you should read some before assuming – phuclv Oct 12 '13 at 14:06
  • Thank you for your wisdom but what does that at all have to do with the original question? – Johan Kotlinski Oct 12 '13 at 16:00
  • You didn't read my comment? I've just said that 24-bit int does not mean that sizeof(int) is 3. That's what you should take care about your answer – phuclv Oct 13 '13 at 00:00
  • Well :) You can interpret my answer in the way that 8-bit char is implicit. Why would there not be platforms that have 24-bit ints and 8-bit chars? – Johan Kotlinski Oct 14 '13 at 12:57
  • 2
    well, in those [24-bit architectures](http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char?lq=1) `CHAR_BIT == 24` so `sizeof(char) == sizeof(short) == sizeof(int) == 1` and `int` is 1 byte, therefore all the sizes are still powers of 2. The OP is interested in the case where the size is not a power of 2. – phuclv Feb 05 '15 at 08:12