7

Why are all data type sizes always a power of 2?

Let's take two examples:

short int 16
char 8

Why are they not the like following?

short int 12
phuclv
  • 37,963
  • 15
  • 156
  • 475
Vivek Goel
  • 22,942
  • 29
  • 114
  • 186
  • 2
    I think there exists a system where `CHAR_BIT==9`, and an `int` is 36 bits. – dreamlax Mar 04 '11 at 09:25
  • 1
    And there are many DSPs where the word size is 24 bits. – Paul R Mar 04 '11 at 09:27
  • 1
    When you think about it, it just makes sense to make things a power of two, since everything is binary, which is _base_ two. So (most) everything is (usually) a power with _base_ two. – Andrew Marshall Mar 04 '11 at 09:29
  • datatype size, alignment, pagefile size and more are all easier to implement when you can simply manipulate them using easy bit/bitshift/bitmask operations. no need for multiplying/divisions which are way too expensive and nobody wants to produce slow hardware – fazo Mar 04 '11 at 09:50
  • Not all of them are powers of 2. Both [size in bits](https://stackoverflow.com/q/6971886/995714) and [size in bytes](https://stackoverflow.com/q/17834838/995714) can be non-powers of 2 – phuclv Jun 30 '18 at 05:37

10 Answers10

11

That's an implementation detail, and it isn't always the case. Some exotic architectures have non-power-of-two data types. For example, 36-bit words were common at one stage.

The reason powers of two are almost universal these days is that it typically simplifies internal hardware implementations. As a hypothetical example (I don't do hardware, so I have to confess that this is mostly guesswork), the portion of an opcode that indicates how large one of its arguments is might be stored as the power-of-two index of the number of bytes in the argument, thus two bits is sufficient to express which of 8, 16, 32 or 64 bits the argument is, and the circuitry required to convert that into the appropriate latching signals would be quite simple.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
  • Those two bits of the opcode could equally easily specify 12, 24, 36 or 48 bits. It's the choice of having an 8 bit minimum unit which makes everything sized like it is these days. – John Ripley Mar 04 '11 at 21:30
  • A 36-bit word is still a power of 2 (but not a power of 256 (not a whole number of *bytes*)). (*is* in this context meaning *can represent that range of integer values*) – Peter Mortensen May 14 '13 at 12:52
3

The reason why builtin types are those sizes is simply that this is what CPUs support natively, i.e. it is the fastest and easiest. No other reason.

As for structs, you can have variables in there which have (almost) any number of bits, but you will usually want to stay with integral types unless there is a really urgent reason for doing otherwise.

You will also usually want to group identical-size types together and start a struct with the largest types (usually pointers).
That will avoid needless padding and it will make sure you don't have access penalties that some CPUs exhibit with misaligned fields (some CPUs may even trigger an exception on unaligned access, but in this case the compiler would add padding to avoid it, anyway).

Damon
  • 67,688
  • 20
  • 135
  • 185
  • 4
    What *some* CPUs support natively, you mean. – Cody Gray - on strike Mar 04 '11 at 09:25
  • If _some_ means 99.9% of all existing systems and 100% of all systems that you're likely to ever encounter unless you're a collector, then yes, some. :-) Architectures with 24 or 36 bit integers or 7 bit chars just are not commonplace, you will have to admit. – Damon Mar 04 '11 at 09:32
  • 1
    No, that's not what some means. It's also not accurate. Burying your head in the sand and pretending that alternate architectures do not exist is a huge mistake. Assuming implementation details like this is not how you become a better programmer, which should always be the goal. There's a big difference between "not commonplace in my experience" and "99.9% of all existing systems". – Cody Gray - on strike Mar 04 '11 at 09:38
  • @Cody: I'm going to have to disagree with you there. The vast majority of programmers on this planet have never and will never encounter a non-POT architecture, and large amounts of code that assumes POT have been written and will continue to be written for the foreseeable future. Since POT data sizes are ubiquitous in desktop, server and mainstream mobile devices, this is hardly an unreasonable assumption. Of course, it's good to know that other architectures exist, but that doesn't make it wrong to assume POT. It sometimes helps performance an awful lot. – Marcelo Cantos Mar 04 '11 at 09:56
  • @Marcelo: While I agree that it can *sometimes* be helpful to target code to your specific architecture for performance reasons, my point was quite a bit more general. I'm simply arguing that ignoring the presence of alternative architectures and dismissing them as rare to the point of irrelevance is a bad idea. **Especially** to a general question like this one. I'm not disagreeing that most programmers lack experience with such systems (I myself lack such experience), but I see that as a potential hazard, not something to be encouraged. – Cody Gray - on strike Mar 04 '11 at 10:20
  • @Cody: Actually, it's the broader point that I disagree with. Programmers deal with *potential* hazards all the time. Much of the database code I work with implicitly assumes we will never have 10 million rows in our "clients" table. Which assumption is more likely to be violated: a) we will never have 10 million clients, or b) our code will never run on a non-POT architecture? I will think about dealing with (a) when we reach one million clients, whereas I doubt that I will ever give (b) a second thought. Is that really a huge mistake? – Marcelo Cantos Mar 04 '11 at 10:53
  • See, nobody (including me) doubts that systems with NPOT integers exist. And while I agree that making assumptions such as sizeof(void*)==4 are no good and have never been, the reply "because that's what CPUs chew well" is an entirely adaequate answer for someone who to all appearances is in the process of learning the basics of programming is stumped about the reason for something that seems to be "standard" for no apparent reason. Arguing about whether some mainframes in the 1960s had 6bit words and 36bit addresses is more likely to add more confusion, to be honest. – Damon Mar 04 '11 at 11:43
  • (limited by max len) As an analogy: This is as if someone asked why TVs (and hdtv signal, and DVDs and whatnot) come in 720(p/i) and 1080(p/i) resolutions, why such odd numbers, why not something else, maybe 1000. If you tell them "because 720 is 1.5 times 480, which used to be what you had on a tv, and 1080 is 1.5 times 720" then people will say "aha, sure, that makes sense". If you say "well you know, there is PAL with 625 lines too, and besides, there exist TVs in black and white...", then that is certainly correct, but does it make someone understand why there's 1080p? – Damon Mar 04 '11 at 11:51
3

The size of char, short, int, long etc differ depending on the platform. 32 bit architectures tend to have char=8, short=16, int=32, long=32. 64 bit architectures tend to have char=8, short=16, int=32, long=64.

Many DSPs don't have power of 2 types. For example, Motorola DSP56k (a bit dated now) has 24 bit words. A compiler for this architecture (from Tasking) has char=8, short=16, int=24, long=48. To make matters confusing, they made the alignment of char=24, short=24, int=24, long=48. This is because it doesn't have byte addressing: the minimum accessible unit is 24 bits. This has the exciting (annoying) property of involving lots of divide/modulo 3 when you really do have to access an 8 bit byte in an array of packed data.

You'll only find non-power-of-2 in special purpose cores, where the size is tailored to fit a special usage pattern, at an advantage to performance and/or power. In the case of 56k, this was because there was a multiply-add unit which could load two 24 bit quantities and add them to a 48 bit result in a single cycle on 3 buses simultaneously. The entire platform was designed around it.

The fundamental reason most general purpose architectures use powers-of-2 is because they standardized on the octet (8 bit bytes) as the minimum size type (aside from flags). There's no reason it couldn't have been 9 bit, and as pointed out elsewhere 24 and 36 bit were common. This would permeate the rest of the design: if x86 was 9 bit bytes, we'd have 36 octet cache lines, 4608 octet pages, and 569KB would be enough for everyone :) We probably wouldn't have 'nibbles' though, as you can't divide a 9 bit byte in half.

This is pretty much impossible to do now, though. It's all very well having a system designed like this from the start, but inter-operating with data generated by 8 bit byte systems would be a nightmare. It's already hard enough to parse 8 bit data in a 24 bit DSP.

John Ripley
  • 4,434
  • 1
  • 21
  • 17
  • The choice of 24 bits was of course due to being intended as an audio processor. In fact, the accumulators on 56k are 56 bits wide (hence the name). So it's not even a trivial multiple of 8 bits as a type! The rationale: you can multiply two 24 bit numbers to get 48 bits of answer, and then sum 256 of them together before you need overflow checking. Smart design, and everything got designed around it. – John Ripley Mar 04 '11 at 09:38
  • Edited my answer to include that there's no fundamental reason it has to be powers-of-2. Although I'm glad it is! – John Ripley Mar 04 '11 at 21:48
2

Well, they are powers of 2 because they are multiples of 8, and this comes (simplifying a little) from the fact that usually the atomic allocation unit in memory is a byte, which (edit: often, but not always) is made by 8 bits.

Bigger data sizes are made taking multiple bytes at a time. So you could have 8,16,24,32... data sizes.

Then, for the sake of memory access speed, only powers of 2 are used as a multiplier of the minimum size (8), so you get data sizes along these lines:

 8 => 8 * 2^0 bits => char
16 => 8 * 2^1 bits => short int
32 => 8 * 2^2 bits => int
64 => 8 * 2^3 bits => long long int
garph0
  • 1,700
  • 1
  • 13
  • 16
  • I stated that I was simplifying, since it was clearly not the case of DSPs. I don't get why -1, though. – garph0 Mar 04 '11 at 09:38
  • I didn't downvote, but, given that the first half of the first sentence is flat-out wrong (24 is not a power of two), I don't think a downvote is all that harsh. – Marcelo Cantos Mar 04 '11 at 10:03
  • 1
    And who said that 24 is a power of two? I said that 24 is a multiple of 8. – garph0 Mar 04 '11 at 11:11
  • The -1 was from me, for stating "... a byte, which is made by 8 bits.", which is incorrect and misleading. If you fix this then I'll remove the -1. – Paul R Mar 04 '11 at 12:12
  • @Paul R: as I already replied to you previous comment, I stated that I was simplifying the matter for simplicity's sake since it seemed to me that Vivek was speaking of the mostl common x86 case. Anyway, I imagine that it may be not clear enough, so I'll add a note. – garph0 Mar 04 '11 at 14:48
1

8 bits is the most common size for a byte (but not the only size, examples of 9 bit bytes and other byte sizes are not hard to find). Larger data types are almost always multiples of the byte size, hence they will typically be 16, 32, 64, 128 bits on systems with 8 bit bytes, but not always powers of 2, e.g. 24 bits is common for DSPs, and there are 80 bit and 96 bit floating point types.

Paul R
  • 208,748
  • 37
  • 389
  • 560
0

There are a few cases where integral types must be an exact power of two. If the exact-width types in <stdint.h> exist, such as int16_t or uint32_t, their widths must be exactly that size, with no padding. Floating-point math that declares itself to follow the IEEE standard forces float and double to be powers of two (although long double often is not). There are additionally types char16_t and char32_t in the standard library now, or built-in to C++, defined as exact-width types. The requirements about support for UTF-8 in effect mean that char and unsigned char have to be exactly 8 bits wide.

In practice, a lot of legacy code would already have broken on any machine that didn’t support types exactly 8, 16, 32 and 64 bits wide. For example, any program that reads or writes ASCII or tries to connect to a network would break.

Some historically-important mainframes and minicomputers had native word sizes that were multiples of 3, not powers of two, particularly the DEC PDP-6, PDP-8 and PDP-10.

This was the main reason that base 8 used to be popular in computing: since each octal digit represented three bits, a 9-, 12-, 18- or 36-bit pattern could be represented more neatly by octal digits than decimal or hex. For example, when using base-64 to pack characters into six bits instead of eight, each packed character took up two octal digits.

The two most visible legacies of those architectures today are that, by default, character escapes such as '\123' are interpreted as octal rather than decimal in C, and that Unix file permissions/masks are represented as three or four octal digits.

Davislor
  • 14,674
  • 2
  • 34
  • 49
0

The sizes of standard integral types are defined as multiple of 8 bits, because a byte is 8-bits (with a few extremely rare exceptions) and the data bus of the CPU is normally a multiple of 8-bits wide.

If you really need 12-bit integers then you could use bit fields in structures (or unions) like this:

struct mystruct
{
    short int twelveBitInt : 12;
    short int threeBitInt  :  3;
    short int bitFlag      :  1;
};

This can be handy in embedded/low-level environments - but bear in mind that the overall size of the structure will still be packed out to the full size.

GrahamS
  • 9,980
  • 9
  • 49
  • 63
0

They aren't necessarily. On some machines and compilers, sizeof(long double) == 12 (96 bits).

fredoverflow
  • 256,549
  • 94
  • 388
  • 662
0

It's not necessary that all data types use of power of 2 as number of bits to represent. For example, long double uses 80 bits(though its implementation dependent on how much bits to allocate).

One advantage you gain with using power of 2 is, larger data types can be represented as smaller ones. For example, 4 chars(8 bits each) can make up an int(32 bits). In fact, some compilers used to simulate 64 bit numbers using two 32 bit numbers.

Shamim Hafiz - MSFT
  • 21,454
  • 43
  • 116
  • 176
0

Most of the times your computer tries to keep all data formats in either a whole multiple (2, 3, 4...) or a whole part (1/2, 1/3, 1/4...) of the machine data size. It does this so that each time it loads N data words it loads an integer number of bits of information for you. That way, it doesn't have to recombine parts later on.

You can see this in the x86 for example:

a char is 1/4th of 32-bits

a short is 1/2 of 32-bits

an int / long are a whole 32 bits

a long long is 2x 32 bits

a float is a single 32-bits

a double is two times 32-bits

a long double may either be three or four times 32-bits, depending on your compiler settings. This is because for 32-bit machines it's three native machine words (so no overhead) to load 96 bits. On 64-bit machines it is 1.5 native machine word, so 128 bits would be more efficient (no recombining). The actual data content of a long double on x86 is 80 bits, so both of these are already padded.

A last aside, the computer doesn't always load in its native data size. It first fetches a cache line and then reads from that in native machine words. The cache line is larger, usually around 64 or 128 bytes. It's very useful to have a meaningful bit of data fit into this and not be stuck on the edge as you'd have to load two whole cache lines to read it then. That's why most computer structures are a power of two in size; it will fit in any power of two size storage either half, completely, double or more - you're guaranteed to never end up on a boundary.

dascandy
  • 7,184
  • 1
  • 29
  • 50