7

I understand that the 'natural size' is the width of integer that is processed most efficiently by a particular hardware. When using short in an array or in arithmetic operations, the short integer must first be converted into int.

Q: What exactly determines this 'natural size'?

I am not looking for simple answers such as

If it has a 32-bit architecture, it's natural size is 32-bit

I want to understand why this is most efficient, and why a short must be converted before doing arithmetic operations on it.

Bonus Q: What happens when arithmetic operations are conducted on a long integer?

dayuloli
  • 16,205
  • 16
  • 71
  • 126

4 Answers4

8

Generally speaking, each computer architecture is designed such that certain type sizes provide the most efficient numerical operations. The specific size then depends on the architecture, and the compiler will select an appropriate size. More detailed explanations as to why hardware designers selected certain sizes for perticular hardware would be out of scope for stckoverflow.

A short most be promoted to int before performing integral operations because that's the way it was in C, and C++ inherited that behavior with little or no reason to change it, possibly breaking existing code. I'm not sure the reason it was originally added in C, but one could speculate that it's related to "default int" where if no type were specified int was assumed by the compiler.

Bonus A: from 5/9 (expressions) we learn: Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:

And then of interest specifically:

  • floating point rules that don't matter here
  • Otherwise, the integral promotions (4.5) shall be performed on both operands
  • Then, if either operand is unsigned long the other shall be converted to unsigned long.
  • Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int.
  • Otherwise, if either operand is long, the other shall be converted to long.

In summary the compiler tries to use the "best" type it can to do binary operations, with int being the smallest size used.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • "In summary the compiler tries to use the "best" type it can to do binary operations, with int being the smallest size used." ~Very clear thank you! – dayuloli Jun 23 '14 at 16:58
  • Adding on, some processors may restrict the size of the natural word, due to real-estate or cost reasons. For example, a 16-bit cpu, requires 16 lines in its data bus to pass data. A 32-bit cpu requires double the space to pass data around. Remember, this includes lines to the Arithmetic Logic Unit (ALU), Comparator, Barrel Shifter, Multiplier and other units, as well as pins going to external devices, including memory. – Thomas Matthews Jun 23 '14 at 20:08
  • When performing operations on data types larger than the processor's word size, the processor must increase the number of operations. For example, when adding 64-bit units on a 32-bit machine, one must add the lower words, then promote any carry to the addition of the higher words. Much like you perform carry when adding decimal columns (digits). – Thomas Matthews Jun 23 '14 at 20:12
  • 1
    The OP asked a follow-up question and I cover where this comes from in C99 in my [answer](http://stackoverflow.com/a/24372323/1708801) and +1 for a nice answer. – Shafik Yaghmour Jun 24 '14 at 02:25
7

the 'natural size' is the width of integer that is processed most efficiently by a particular hardware.

Not really. Consider the x64 architecture. Arithmetic on any size from 8 to 64 bits will be essentially the same speed. So why have all x64 compilers settled on a 32-bit int? Well, because there was a lot of code out there which was originally written for 32-bit processors, and a lot of it implicitly relied on ints being 32-bits. And given the near-uselessness of a type which can represent values up to nine quintillion, the extra four bytes per integer would have been virtually unused. So we've decided that 32-bit ints are "natural" for this 64-bit platform.

Compare the 80286 architecture. Only 16 bits in a register. Performing 32-bit integer addition on such a platform basically requires splitting it into two 16-bit additions. Doing virtually anything with it involves splitting it up, really-- and an attendant slowdown. The 80286's "natural integer size" is most definitely not 32 bits.

So really, "natural" comes down to considerations like processing efficiency, memory usage, and programmer-friendliness. It is not an acid test. It is very much a matter of subjective judgment on the part of the architecture/compiler designer.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • Here is what I got from this answer, please correct me if I interpreted it wrongly. A 64-bit processor will process `int` types between 8- to 64-bits practically with the same efficiency. By extension, a 32-bit processor will process `int` types between 8- to 32-bits with same efficiency. But using a 64-bit `int` on a 32-bit processor requires splitting it into two processes, and this will impact on the efficiency, and so a 64-bit `int` will not be a 'natural size' for a 32-bit processor. And this 'natural size' is 'chosen' by the compiler, based on various factors. – dayuloli Jun 23 '14 at 17:06
  • 1
    Basically, yeah. It's worth noting that there *are* some processors where processing smaller types is slower, particularly when transferring them to and from memory. PowerPC is a good example of this. – Sneftel Jun 23 '14 at 17:50
  • 1
    @dayuloli Actually `int` has to be at least 16 bits, according to the standards. – Fred Foo Jun 23 '14 at 20:00
  • @larsmans That's true, but the (lack of) rules regarding narrowing of signed types basically allows `int8 = int8 + int16` to happen entirely through 8-bit operations, regardless of the putative type promotion involved. – Sneftel Jun 23 '14 at 22:46
  • 1
    A major reason for not making `int` 64 bits on 64-bit systems is that it would leave only two standard integer types (`char` and `short`) to cover three different sizes (8, 16, and 32 bits). To make the ILP64 model work, you'd need an additional type (`short short`?), but LP64 and LLP64 are compatible with the existing language definition. This is discussed in detail at http://www.unix.org/version2/whatsnew/lp64_wp.html . – dan04 Jun 24 '14 at 01:19
  • not actually. 64-bit divisions are still much slower than 8-bit division in x86_64, as you can see in the latency list. It's the same speed for addition/subtraction. – phuclv Jun 24 '14 at 02:43
3

What exactly determines this 'natural size'?

For some processors (e.g. 32-bit ARM, and most DSP-style processors), it's determined by the architecture; the processor registers are a particular size, and arithmetic can only be done on values of that size.

Others (e.g. Intel x64) are more flexible, and there's no single "natural" size; it's up to the compiler designers to choose a size, a compromise between efficiency, range of values, and memory usage.

why this is most efficient

If the processor requires values to be a particular size for arithmetic, then choosing another size will force you to convert the values to the required size - probably for a cost.

why a short must be converted before doing arithmetic operations on it

Presumably, that was a good match for the behaviour of commonly-used processors when C was developed, half a century ago. C++ inherited the promotion rules from C. I can't really comment on exactly why it was deemed a good idea, since I wasn't born then.

What happens when arithmetic operations are conducted on a long integer?

If the processor registers are large enough to hold a long, then the arithmetic will be much the same as for int. Otherwise, the operations will have to be broken down into several operations on values split between multiple registers.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • The promotion rules make sure that if a machine's registers are 32-bit and instructions operate on whole registers, then the compiler doesn't need to truncate the result of operations to get a `short` back (which might take an additional instruction). – Fred Foo Jun 23 '14 at 16:55
  • What determines whether this 'natural size' is determined by the architecture or by the compiler? (and is it ONLY by the architecture OR the compiler?) Is it usual for it to be determined by both? From the other answers, I get the impression that a compiler designer can choose within a range. – dayuloli Jun 23 '14 at 17:09
  • A compiler designer chooses at his own peril, if he wants his compiler's executables to work properly with existing system libraries. The architecture and OS are really the deciding factors. – Sneftel Jun 23 '14 at 17:55
  • @dayuloli: As I said: some processors only support arithmetic on a specific data size, so that is the only sensible option for a "natural" size. Others are more flexible, allowing arithmetic on various different data sizes, so there is a choice of suitable sizes. – Mike Seymour Jun 24 '14 at 00:39
0

I understand that the 'natural size' is the width of integer that is processed most efficiently by a particular hardware.

That's an excellent start.

Q: What exactly determines this 'natural size'?

The paragraph above is the definition of "natural size". Nothing else determines it.

I want to understand why this is most efficient

By definition.

and why a short must be converted before doing arithmetic operations on it.

It is so because the C language definitions says so. There are no deep architectural reasons (there could have been some when C was invented).

Bonus Q: What happens when arithmetic operations are conducted on a long integer?

A bunch of electrons rushes through dirty sand and meets a bunch of holes. (No, really. Ask a vague question...)

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243