80

When understanding how primitive operators such as +, -, * and / are implemented in C, I found the following snippet from an interesting answer.

// replaces the + operator
int add(int x, int y) {
    while(x) {
        int t = (x & y) <<1;
        y ^= x;
        x = t;
    }
    return y;
}

It seems that this function demonstrates how + actually works in the background. However, it's too confusing for me to understand it. I believed that such operations are done using assembly directives generated by the compiler for a long time!

Is the + operator implemented as the code posted on MOST implementations? Does this take advantage of two's complement or other implementation-dependent features?

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
nalzok
  • 14,965
  • 21
  • 72
  • 139
  • 60
    I guess most implementations will use native `add` machine instruction, which I guess almose all CPU have and implemented as hardware adder that work in a few clocks. – MikeCAT Feb 26 '16 at 13:07
  • 3
    Yes, the `+` operator very likely takes advantage of implementation defined features. These are called "machine language" and "CPU". What is your question`? If you want to know how expressions are converted to machine code, please read about compiler construction. – too honest for this site Feb 26 '16 at 13:07
  • Do you mean the code I posted is totally useless, because `+`s are translated to assembly directives such as `add` by the compiler? – nalzok Feb 26 '16 at 13:12
  • 11
    Most `+` operations will be compiled into some variant *(or combination)* of machine code `add` instructions. Your code is convoluted and useless in every real-world scenario, but it **can** serve to teach about binary operations. – Anders Marzi Tornblad Feb 26 '16 at 14:44
  • Remember `+` is polymorphic (even in C). `int_value_a + int_value_b` is completely different than `pointer_value + size_t_value` – pmg Feb 26 '16 at 14:52
  • 11
    While not how C does it (see answers below), it's quite close to how the circuitry involved can add at the very lowest level. Try working through it in paper and pen for small binary values (say 3- or 4-bit bytes) and see how it works. Now imagine how circuitry could do the same thing with electrical pulses. Now imagine doing all the bits in parallel, rather than a loop. Now you're ready to build a 1940s era computer :D – Jon Hanna Feb 26 '16 at 15:57
  • @pmg: Well, a pointer add might well be implemented as the same machine instructions as an integer add. A better example might be double_a + double_b, which (in 80x86 instruction set) would be converted to an FADD. – jamesqf Feb 26 '16 at 18:40
  • @pmg: That's technically not polymorphic, just overloaded. The decision of whether to perform integer arithmetic or pointer arithmetic is made at compile time based on static typing rules, and so the compiler chooses which machine instructions to compile `+` to rather than compiling it to all possible options. – jwodder Feb 26 '16 at 19:06
  • @jamesqf well, the integer operand has to be multiplied by the size of the pointer target before being added (i.e. `p+x` == `(pointer)((integer)p+4*x)`). Some CPUs may have a special instruction for doing this, or at least with specific sizes. – Random832 Feb 26 '16 at 21:32
  • there's no known physical architecture without `add` instruction – phuclv Feb 27 '16 at 09:55
  • The type of logic circuit emulated here is called a *Ripple-Carry Adder*, fyi. – luser droog Feb 28 '16 at 03:22
  • 1
    It isn't implemented in that way in any implementations, unless you can produce a CPU without an add instruction. The code you posted illustrates what the CPU does in hardware. – user207421 Mar 27 '16 at 09:10

9 Answers9

184

To be pedantic, the C specification does not specify how addition is implemented.

But to be realistic, the + operator on integer types smaller than or equal to the word size of your CPU get translated directly into an addition instruction for the CPU, and larger integer types get translated into multiple addition instructions with some extra bits to handle overflow.

The CPU internally uses logic circuits to implement the addition, and does not use loops, bitshifts, or anything that has a close resemblance to how C works.

orlp
  • 112,504
  • 36
  • 218
  • 315
  • 12
    This answer is excellent because it is presented with unusual clarity and simplicity. I don't find it overly pedantic at all, merely the right dose of pedantry for the question. – Jeremy Anderson Feb 26 '16 at 14:28
  • 5
    @orlp actually, CPU logic circuits can be compiled from HDLs, and you are likely to generate an adder using loops and bitshifts vaguely similar to the OP's suggestion (but only vaguely). Said loops and bitshifts would describe the layout of hardware and how they are connected. Then again, in top-tier hardware someone might unroll said loops and bitshifts, or even do away with the HDL and manually lay out the circuit for something as performance critical as an adder. – Yakk - Adam Nevraumont Feb 26 '16 at 15:15
  • 5
    A linear adder circuit does exactly what that C code does but the loop is fully unrolled in the hardware (32 times). – usr Feb 26 '16 at 15:21
  • 2
    @usr not just unrolled, but every "step" happens simultaneously. – OrangeDog Feb 26 '16 at 18:38
  • 4
    @OrangeDog a simple hardware adder will have a carry rippling through much like this C code does which limits paralellism. High performance adders may use carry lookahead circuits to reduce this. – plugwash Feb 26 '16 at 22:02
  • 2
    It's important to note that even on a processor which has an "ADD" instruction which would behave like the "+" operator in all cases that are defined by the C Standard, there's no guarantee that code will behave anything like that instruction in cases involving arithmetic overflow. Modern compilers get rather "creative" with such things, and may even negate laws of time and causality (so that an overflow severely disrupts program behavior before it occurs). – supercat Feb 26 '16 at 23:00
  • 1
    @usr: See my answer. Absolutely not true. No processor would do it like this. – gnasher729 Feb 27 '16 at 22:04
  • @supercat that's only when they notice that an overflow will occur. No compiler is so evil that it would insert an overflow check so that it can do weird stuff. Oh and unsigned overflow is well defined to use a modulo a power of two arithmetic. – John Dvorak Feb 28 '16 at 08:59
  • 1
    @JanDvorak It depends on what you mean by "weird stuff", but there are compilers out there that do optionally insert overflow checks and specifically abort the program if signed integer overflow is detected. This isn't evil, this is very helpful during debugging. Anyway, I'm pretty sure that's not what supercat meant, it wasn't about breaking programs when overflow is detected, it was about aggressively optimising on the assumption that there's no overflow in the program, which happens to break programs that do contain overflow. –  Feb 28 '16 at 18:06
  • @JanDvorak: Some compiler writers would think it entirely appropriate for a compiler with 32-bit `int`, given something like `uint16_t sqr16(uint16_t x) { return x*x;} void foo(uint16_t x) if (x < 60000) bar(); return sqr16(x);}` to in-line the code for `foo` but make the call to `bar` unconditional since the function would invoke Undefined Behavior unless either `x` is less than 46341 or `bar` exits without returning. Such omission would save code and improve performance in all cases where the Standard imposes any requirements, so what's the problem? – supercat Feb 29 '16 at 15:59
  • @JanDvorak: Trapped overflows are a useful option; if computers were required to emulate two's-complement wrapping behavior (and even one's-complement or sign-magnitude machines could have emulated it using signed types) programmers who need their programs to exit on overflow would have to add cumbersome code for that purpose *even when targeting machines where overflow would naturally trap*. IMHO, it's good that the Standard allows for the possibility that implementations might define "interesting" behaviors--what's bad is the lack of any *normative* behaviors in the absence of contrary spec. – supercat Mar 01 '16 at 15:04
  • @supercat http://stackoverflow.com/questions/18195715/why-is-unsigned-integer-overflow-defined-behavior-but-signed-integer-overflow-is – John Dvorak Mar 01 '16 at 15:15
  • @hvd: I should have pinged you to "Trapped overflows are a useful option". – supercat Mar 01 '16 at 15:53
  • It's not unlikely that the logic circuits that implement the hardware adder use an algorithm something like the code posted in the question. See https://en.wikipedia.org/wiki/Adder_(electronics) – Barmar Mar 01 '16 at 18:28
77

When you add two bits, following is the result: (truth table)

a | b | sum (a^b) | carry bit (a&b) (goes to next)
--+---+-----------+--------------------------------
0 | 0 |    0      | 0
0 | 1 |    1      | 0
1 | 0 |    1      | 0
1 | 1 |    0      | 1

So if you do bitwise xor, you can get the sum without carry. And if you do bitwise and you can get the carry bits.

Extending this observation for multibit numbers a and b

a+b = sum_without_carry(a, b) + carry_bits(a, b) shifted by 1 bit left
    = a^b + ((a&b) << 1)

Once b is 0:

a+0 = a

So algorithm boils down to:

Add(a, b)
  if b == 0
    return a;
  else
    carry_bits = a & b;
    sum_bits = a ^ b;
    return Add(sum_bits, carry_bits << 1);

If you get rid of recursion and convert it to a loop

Add(a, b)
  while(b != 0) {
    carry_bits = a & b;
    sum_bits = a ^ b;

    a = sum_bits;
    b = carrry_bits << 1;  // In next loop, add carry bits to a
  }
  return a;

With above algorithm in mind explanation from code should be simpler:

int t = (x & y) << 1;

Carry bits. Carry bit is 1 if 1 bit to the right in both operands is 1.

y ^= x;  // x is used now

Addition without carry (Carry bits ignored)

x = t;

Reuse x to set it to carry

while(x)

Repeat while there are more carry bits


A recursive implementation (easier to understand) would be:

int add(int x, int y) {
    return (y == 0) ? x : add(x ^ y, (x&y) << 1);
}

Seems that this function demonstrates how + actually works in the background

No. Usually (almost always) integer addition translates to machine instruction add. This just demonstrate an alternate implementation using bitwise xor and and.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • 5
    This is the best answer imo, all the others state that it's usually translated to a single instruction, but this does that and *also* explains the given function. – Nick Sweeting Mar 01 '16 at 23:00
  • @NickSweeting Thanks. The question may be interpreted in 2 ways and I think accepted answer interpreted it right what the OP wanted to ask. – Mohit Jain Mar 02 '16 at 04:47
25

Seems that this function demonstrates how + actually works in the background

No. This is translated to the native add machine instruction, which is actually using the hardware adder, in the ALU.

If you're wondering how does the computer add, here is a basic adder.

Everything in the computer is done using logic gates, which are mostly made of transistors. The full adder has half-adders in it.

For a basic tutorial on logic gates, and adders, see this. The video is extremely helpful, though long.

In that video, a basic half-adder is shown. If you want a brief description, this is it:

The half adder add's two bits given. The possible combinations are:

  • Add 0 and 0 = 0
  • Add 1 and 0 = 1
  • Add 1 and 1 = 10 (binary)

So now how does the half adder work? Well, it is made up of three logic gates, the and, xor and the nand. The nand gives a positive current if both the inputs are negative, so that means this solves the case of 0 and 0. The xor gives a positive output one of the input is positive, and the other negative, so that means that it solves the problem of 1 and 0. The and gives a positive output only if both the inputs are positive, so that solves the problem of 1 and 1. So basically, we have now got our half-adder. But we still can only add bits.

Now we make our full-adder. A full adder consists of calling the half-adder again and again. Now this has a carry. When we add 1 and 1, we get a carry 1. So what the full-adder does is, it takes the carry from the half-adder, stores it, and passes it as another argument to the half-adder.

If you're confused how can you pass the carry, you basically first add the bits using the half-adder, and then add the sum and the carry. So now you've added the carry, with the two bits. So you do this again and again, till the bits you have to add are over, and then you get your result.

Surprised? This is how it actually happens. It looks like a long process, but the computer does it in fractions of a nanosecond, or to be more specific, in half a clock cycle. Sometimes it is performed even in a single clock cycle. Basically, the computer has the ALU (a major part of the CPU), memory, buses, etc..

If you want to learn computer hardware, from logic gates, memory and the ALU, and simulate a computer, you can see this course, from which I learnt all this: Build a Modern Computer from First Principles

It's free if you do not want an e-certificate. The part two of the course is coming up in spring this year

Community
  • 1
  • 1
Box Box Box Box
  • 5,094
  • 10
  • 49
  • 67
  • 11
    A few milliseconds? For a single add? – JAB Feb 26 '16 at 14:42
  • 2
    Addition with two enregistered values is generally completed in a single clock. – Cody Gray - on strike Feb 26 '16 at 15:19
  • 5
    @Tamoghna Chowdhury: Try some fractions of a nanosecond. Register add is IIRC one clock on recent Intel processors, so with a clock speed of several GHz... And that's not counting pipelining, superscalar execution, and such. – jamesqf Feb 26 '16 at 18:51
  • This ripple-carry adder would add too much latency, so it's not even implemented this way in the hardware. – pipe Feb 26 '16 at 21:52
  • The ripple-carry adder hasn't been used by CPUs for decades, because it's too slow. Instead, they use more complex adders that can do the job in a single clock cycle (or even half a cycle, in the case of some of Intel's double-clocked ALUs). (Well, most CPUs don't use it. Low-end embedded CPUs might still use it for the low transistor count.) – Mark Feb 27 '16 at 00:57
15

C uses an abstract machine to describe what C code does. So how it works is not specified. There are C "compilers" that actually compile C into a scripting language, for example.

But, in most C implementations, + between two integers smaller than the machine integer size will be translated into an assembly instruction (after many steps). The assembly instruction will be translated into machine code and embedded within your executable. Assembly is a language "one step removed" from machine code, intended to be easier to read than a bunch of packed binary.

That machine code (after many steps) is then interpreted by the target hardware platform, where it is interpreted by the instruction decoder on the CPU. This instruction decoder takes the instruction, and translates it into signals to send along "control lines". These signals route data from registers and memory through the CPU, where the values are added together often in an arithmetic logic unit.

The arithmetic logic unit might have separate adders and multipliers, or might mix them together.

The arithmetic logic unit has a bunch of transistors that perform the addition operation, then produce the output. Said output is routed via the signals generated from the instruction decoder, and stored in memory or registers.

The layout of said transistors in both the arithmetic logic unit and instruction decoder (as well as parts I have glossed over) is etched into the chip at the plant. The etching pattern is often produced by compiling a hardware description language, which takes an abstraction of what is connected to what and how they operate and generates transistors and interconnect lines.

The hardware description language can contain shifts and loops that don't describe things happening in time (like one after another) but rather in space -- it describes the connections between different parts of hardware. Said code may look very vaguely like the code you posted above.

The above glosses over many parts and layers and contains inaccuracies. This is both from my own incompetence (I have written both hardware and compilers, but am an expert in neither) and because full details would take a career or two, and not a SO post.

Here is a SO post about an 8-bit adder. Here is a non-SO post, where you'll note some of the adders just use operator+ in the HDL! (The HDL itself understands + and generates the lower level adder code for you).

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
14

Almost any modern processor that can run compiled C code will have builtin support for integer addition. The code you posted is a clever way to perform integer addition without executing an integer add opcode, but it is not how integer addition is normally performed. In fact, the function linkage probably uses some form of integer addition to adjust the stack pointer.

The code you posted relies on the observation that when adding x and y, you can decompose it into the bits they have in common and the bits that are unique to one of x or y.

The expression x & y (bitwise AND) gives the bits common to x and y. The expression x ^ y (bitwise exclusive OR) gives the bits that are unique to one of x or y.

The sum x + y can be rewritten as the sum of two times the bits they have in common (since both x and y contribute those bits) plus the bits that are unique to x or y.

(x & y) << 1 is twice the bits they have in common (the left shift by 1 effectively multiplies by two).

x ^ y is the bits that are unique to one of x or y.

So if we replace x by the first value and y by the second, the sum should be unchanged. You can think of the first value as the carries of the bitwise additions, and the second as the low-order bit of the bitwise additions.

This process continues until x is zero, at which point y holds the sum.

Tom Karzes
  • 22,815
  • 2
  • 22
  • 41
14

The code that you found tries to explain how very primitive computer hardware might implement an "add" instruction. I say "might" because I can guarantee that this method isn't used by any CPU, and I'll explain why.

In normal life, you use decimal numbers and you have learned how to add them: To add two numbers, you add the lowest two digits. If the result is less than 10, you write down the result and proceed to the next digit position. If the result is 10 or more, you write down the result minus 10, proceed to the next digit, buy you remember to add 1 more. For example: 23 + 37, you add 3+7 = 10, you write down 0 and remember to add 1 more for the next position. At the 10s position, you add (2+3) + 1 = 6 and write that down. Result is 60.

You can do the exact same thing with binary numbers. The difference is that the only digits are 0 and 1, so the only possible sums are 0, 1, 2. For a 32 bit number, you would handle one digit position after the other. And that is how really primitive computer hardware would do it.

This code works differently. You know the sum of two binary digits is 2 if both digits are 1. So if both digits are 1 then you would add 1 more at the next binary position and write down 0. That's what the calculation of t does: It finds all places where both binary digits are 1 (that's the &) and moves them to the next digit position (<< 1). Then it does the addition: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 is 2, but we write down 0. That's what the excludive or operator does.

But all the 1's that you had to handle in the next digit position haven't been handled. They still need to be added. That's why the code does a loop: In the next iteration, all the extra 1's are added.

Why does no processor do it that way? Because it's a loop, and processors don't like loops, and it is slow. It's slow, because in the worst case, 32 iterations are needed: If you add 1 to the number 0xffffffff (32 1-bits), then the first iteration clears bit 0 of y and sets x to 2. The second iteration clears bit 1 of y and sets x to 4. And so on. It takes 32 iterations to get the result. However, each iteration has to process all bits of x and y, which takes a lot of hardware.

A primitive processor would do things just as quick in the way you do decimal arithmetic, from the lowest position to the highest. It also takes 32 steps, but each step processes only two bits plus one value from the previous bit position, so it is much easier to implement. And even in a primitive computer, one can afford to do this without having to implement loops.

A modern, fast and complex CPU will use a "conditional sum adder". Especially if the number of bits is high, for example a 64 bit adder, it saves a lot of time.

A 64 bit adder consists of two parts: First, a 32 bit adder for the lowest 32 bit. That 32 bit adder produces a sum, and a "carry" (an indicator that a 1 must be added to the next bit position). Second, two 32 bit adders for the higher 32 bits: One adds x + y, the other adds x + y + 1. All three adders work in parallel. Then when the first adder has produced its carry, the CPU just picks which one of the two results x + y or x + y + 1 is the correct one, and you have the complete result. So a 64 bit adder only takes a tiny bit longer than a 32 bit adder, not twice as long.

The 32 bit adder parts are again implemented as conditional sum adders, using multiple 16 bit adders, and the 16 bit adders are conditional sum adders, and so on.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
13

My question is: Is the + operator implemented as the code posted on MOST implementations?

Let's answer the actual question. All operators are implemented by the compiler as some internal data structure that eventually gets translated into code after some transformations. You can't say what code will be generated by a single addition because almost no real world compiler generates code for individual statements.

The compiler is free to generate any code as long as it behaves as if the actual operations were performed according to the standard. But what actually happens can be something completely different.

A simple example:

static int
foo(int a, int b)
{
    return a + b;
}
[...]
    int a = foo(1, 17);
    int b = foo(x, x);
    some_other_function(a, b);

There's no need to generate any addition instructions here. It's perfectly legal for the compiler to translate this into:

some_other_function(18, x * 2);

Or maybe the compiler notices that you call the function foo a few times in a row and that it is a simple arithmetic and it will generate vector instructions for it. Or that the result of the addition is used for array indexing later and the lea instruction will be used.

You simply can't talk about how an operator is implemented because it is almost never used alone.

Art
  • 19,807
  • 1
  • 34
  • 60
11

In case a breakdown of the code helps anyone else, take the example x=2, y=6:


x isn't zero, so commence adding to y:

while(2) {

x & y = 2 because

        x: 0 0 1 0  //2
        y: 0 1 1 0  //6
      x&y: 0 0 1 0  //2

2 <<1 = 4 because << 1 shifts all bits to the left:

      x&y: 0 0 1 0  //2
(x&y) <<1: 0 1 0 0  //4

In summary, stash that result, 4, in t with

int t = (x & y) <<1;

Now apply the bitwise XOR y^=x:

        x: 0 0 1 0  //2
        y: 0 1 1 0  //6
     y^=x: 0 1 0 0  //4

So x=2, y=4. Finally, sum t+y by resetting x=t and going back to the beginning of the while loop:

x = t;

When t=0 (or, at the beginning of the loop, x=0), finish with

return y;
user1717828
  • 7,122
  • 8
  • 34
  • 59
  • 1
    There was already a good explanation of *why* we stash the carry bit, so I post this answer to show *how* the code is working. – user1717828 Feb 26 '16 at 14:52
11

Just out of interest, on the Atmega328P processor, with the avr-g++ compiler, the following code implements adding one by subtracting -1 :

volatile char x;
int main ()
  {
  x = x + 1;  
  }

Generated code:

00000090 <main>:
volatile char x;
int main ()
  {
  x = x + 1;  
  90:   80 91 00 01     lds r24, 0x0100
  94:   8f 5f           subi    r24, 0xFF   ; 255
  96:   80 93 00 01     sts 0x0100, r24
  }
  9a:   80 e0           ldi r24, 0x00   ; 0
  9c:   90 e0           ldi r25, 0x00   ; 0
  9e:   08 95           ret

Notice in particular that the add is done by the subi instruction (subtract constant from register) where 0xFF is effectively -1 in this case.

Also of interest is that this particular processor does not have a addi instruction, which implies that the designers thought that doing a subtract of the complement would be adequately handled by the compiler-writers.

Does this take advantage of two's complement or other implementation-dependent features?

It would probably be fair to say that compiler-writers would attempt to implement the wanted effect (adding one number to another) in the most efficient way possible for that particularly architecture. If that requires subtracting the complement, so be it.

Nick Gammon
  • 1,173
  • 10
  • 22