Why are all data type sizes always a power of 2?
Let's take two examples:
short int 16
char 8
Why are they not the like following?
short int 12
Why are all data type sizes always a power of 2?
Let's take two examples:
short int 16
char 8
Why are they not the like following?
short int 12
That's an implementation detail, and it isn't always the case. Some exotic architectures have non-power-of-two data types. For example, 36-bit words were common at one stage.
The reason powers of two are almost universal these days is that it typically simplifies internal hardware implementations. As a hypothetical example (I don't do hardware, so I have to confess that this is mostly guesswork), the portion of an opcode that indicates how large one of its arguments is might be stored as the power-of-two index of the number of bytes in the argument, thus two bits is sufficient to express which of 8, 16, 32 or 64 bits the argument is, and the circuitry required to convert that into the appropriate latching signals would be quite simple.
The reason why builtin types are those sizes is simply that this is what CPUs support natively, i.e. it is the fastest and easiest. No other reason.
As for structs, you can have variables in there which have (almost) any number of bits, but you will usually want to stay with integral types unless there is a really urgent reason for doing otherwise.
You will also usually want to group identical-size types together and start a struct with the largest types (usually pointers).
That will avoid needless padding and it will make sure you don't have access penalties that some CPUs exhibit with misaligned fields (some CPUs may even trigger an exception on unaligned access, but in this case the compiler would add padding to avoid it, anyway).
The size of char, short, int, long etc differ depending on the platform. 32 bit architectures tend to have char=8, short=16, int=32, long=32. 64 bit architectures tend to have char=8, short=16, int=32, long=64.
Many DSPs don't have power of 2 types. For example, Motorola DSP56k (a bit dated now) has 24 bit words. A compiler for this architecture (from Tasking) has char=8, short=16, int=24, long=48. To make matters confusing, they made the alignment of char=24, short=24, int=24, long=48. This is because it doesn't have byte addressing: the minimum accessible unit is 24 bits. This has the exciting (annoying) property of involving lots of divide/modulo 3 when you really do have to access an 8 bit byte in an array of packed data.
You'll only find non-power-of-2 in special purpose cores, where the size is tailored to fit a special usage pattern, at an advantage to performance and/or power. In the case of 56k, this was because there was a multiply-add unit which could load two 24 bit quantities and add them to a 48 bit result in a single cycle on 3 buses simultaneously. The entire platform was designed around it.
The fundamental reason most general purpose architectures use powers-of-2 is because they standardized on the octet (8 bit bytes) as the minimum size type (aside from flags). There's no reason it couldn't have been 9 bit, and as pointed out elsewhere 24 and 36 bit were common. This would permeate the rest of the design: if x86 was 9 bit bytes, we'd have 36 octet cache lines, 4608 octet pages, and 569KB would be enough for everyone :) We probably wouldn't have 'nibbles' though, as you can't divide a 9 bit byte in half.
This is pretty much impossible to do now, though. It's all very well having a system designed like this from the start, but inter-operating with data generated by 8 bit byte systems would be a nightmare. It's already hard enough to parse 8 bit data in a 24 bit DSP.
Well, they are powers of 2 because they are multiples of 8, and this comes (simplifying a little) from the fact that usually the atomic allocation unit in memory is a byte, which (edit: often, but not always) is made by 8 bits.
Bigger data sizes are made taking multiple bytes at a time. So you could have 8,16,24,32... data sizes.
Then, for the sake of memory access speed, only powers of 2 are used as a multiplier of the minimum size (8), so you get data sizes along these lines:
8 => 8 * 2^0 bits => char
16 => 8 * 2^1 bits => short int
32 => 8 * 2^2 bits => int
64 => 8 * 2^3 bits => long long int
8 bits is the most common size for a byte (but not the only size, examples of 9 bit bytes and other byte sizes are not hard to find). Larger data types are almost always multiples of the byte size, hence they will typically be 16, 32, 64, 128 bits on systems with 8 bit bytes, but not always powers of 2, e.g. 24 bits is common for DSPs, and there are 80 bit and 96 bit floating point types.
There are a few cases where integral types must be an exact power of two. If the exact-width types in <stdint.h>
exist, such as int16_t
or uint32_t
, their widths must be exactly that size, with no padding. Floating-point math that declares itself to follow the IEEE standard forces float
and double
to be powers of two (although long double
often is not). There are additionally types char16_t
and char32_t
in the standard library now, or built-in to C++, defined as exact-width types. The requirements about support for UTF-8 in effect mean that char
and unsigned char
have to be exactly 8 bits wide.
In practice, a lot of legacy code would already have broken on any machine that didn’t support types exactly 8, 16, 32 and 64 bits wide. For example, any program that reads or writes ASCII or tries to connect to a network would break.
Some historically-important mainframes and minicomputers had native word sizes that were multiples of 3, not powers of two, particularly the DEC PDP-6, PDP-8 and PDP-10.
This was the main reason that base 8 used to be popular in computing: since each octal digit represented three bits, a 9-, 12-, 18- or 36-bit pattern could be represented more neatly by octal digits than decimal or hex. For example, when using base-64 to pack characters into six bits instead of eight, each packed character took up two octal digits.
The two most visible legacies of those architectures today are that, by default, character escapes such as '\123'
are interpreted as octal rather than decimal in C, and that Unix file permissions/masks are represented as three or four octal digits.
The sizes of standard integral types are defined as multiple of 8 bits, because a byte
is 8-bits (with a few extremely rare exceptions) and the data bus of the CPU is normally a multiple of 8-bits wide.
If you really need 12-bit integers then you could use bit fields
in structures (or unions) like this:
struct mystruct
{
short int twelveBitInt : 12;
short int threeBitInt : 3;
short int bitFlag : 1;
};
This can be handy in embedded/low-level environments - but bear in mind that the overall size of the structure will still be packed out to the full size.
They aren't necessarily. On some machines and compilers, sizeof(long double) == 12
(96 bits).
It's not necessary that all data types use of power of 2 as number of bits to represent. For example, long double uses 80 bits(though its implementation dependent on how much bits to allocate).
One advantage you gain with using power of 2 is, larger data types can be represented as smaller ones. For example, 4 chars(8 bits each) can make up an int(32 bits). In fact, some compilers used to simulate 64 bit numbers using two 32 bit numbers.
Most of the times your computer tries to keep all data formats in either a whole multiple (2, 3, 4...) or a whole part (1/2, 1/3, 1/4...) of the machine data size. It does this so that each time it loads N data words it loads an integer number of bits of information for you. That way, it doesn't have to recombine parts later on.
You can see this in the x86 for example:
a char is 1/4th of 32-bits
a short is 1/2 of 32-bits
an int / long are a whole 32 bits
a long long is 2x 32 bits
a float is a single 32-bits
a double is two times 32-bits
a long double may either be three or four times 32-bits, depending on your compiler settings. This is because for 32-bit machines it's three native machine words (so no overhead) to load 96 bits. On 64-bit machines it is 1.5 native machine word, so 128 bits would be more efficient (no recombining). The actual data content of a long double on x86 is 80 bits, so both of these are already padded.
A last aside, the computer doesn't always load in its native data size. It first fetches a cache line and then reads from that in native machine words. The cache line is larger, usually around 64 or 128 bytes. It's very useful to have a meaningful bit of data fit into this and not be stuck on the edge as you'd have to load two whole cache lines to read it then. That's why most computer structures are a power of two in size; it will fit in any power of two size storage either half, completely, double or more - you're guaranteed to never end up on a boundary.