53

Recently I was going through an old blog post by Eric Lippert in which, while writing about associativity he mentions that in C#, (a + b) + c is not equivalent to a + (b + c) for certain values of a, b, c.

I am not able to figure out for what types and range of arithmetic values might that hold true and why.

TaW
  • 53,122
  • 8
  • 69
  • 111
Vaibhav Singla
  • 832
  • 1
  • 10
  • 14
  • 32
    Probably floating points values... – xanatos Aug 14 '15 at 07:07
  • 7
    FWIW, this issue is not C# specific. Floating point math as implemented in hardware is not exactly associative. Indeed, it's not only hardware but any floating point implementation conforming to IEEE 754. If you need associativity for real-like numbers you need to invent your own format (and probably hire a mathematician in the process) – slebetman Aug 14 '15 at 07:27
  • 7
    @slebetman Any system that uses a fixed number of significant digits will be not associative. If we use a single significant digit for example, rounding after each operation, we have: (1 + 0.5) + 0.5 == 1.5 (rounded to 2) + 0.5 == 2.5 (rounded to 3) == 3, while 1 + (0.5 + 0.5) == 1 + 1 == 2. IEEE 754 simply uses binary digits instead of decimal digits with a fixed number of significant digits (the mantissa) – xanatos Aug 14 '15 at 07:31
  • @slebetman: Rational arithmetic would do the trick, no mathematician required. – user541686 Aug 14 '15 at 07:48
  • 1
    @Mehrdad: xanatos's comment above shows why a mathematician is needed. Turns out it's not possible but I as an engineer didn't know that. – slebetman Aug 14 '15 at 07:53
  • @slebetman: I think you missed the fact that he was talking about a fixed number of digits whereas I wasn't. (Rational arithmetic means you represent every number as a ratio of two integers, which can be arbitrary-precision.) You really don't need a mathematician for this... – user541686 Aug 14 '15 at 08:00
  • @Mehrdad You would need rational arithmetic with infinite digits (so "real world" rational arithmetic) – xanatos Aug 14 '15 at 08:03
  • @xanatos: Not quite -- that's only if you want to go beyond basic arithmetic (`+ - * /`), which would then usually be a library feature rather than a language feature. For basic arithmetic in the language, you don't. In fact the whole *point* of representing numbers as rationals is so you don't need to store an infinite number of digits for basic arithmetic, since the rationals are closed under basic arithmetic. (This is a nontrivial observation; if it's not obvious at first glance that's normal.) – user541686 Aug 14 '15 at 08:04
  • 3
    @LưuVĩnhPhúc: How is a C# question a duplicate of a C question...? – user541686 Aug 14 '15 at 08:09
  • 1
    read this [Is floating point math broken?](http://stackoverflow.com/q/588004/995714) [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) [Floating point addition is not associative](http://www.walkingrandomly.com/?p=5380) – phuclv Aug 14 '15 at 08:09
  • 3
    @Mehrdad this is related to the property of floating-point arithmetic, which is language agnostic – phuclv Aug 14 '15 at 08:10
  • 4
    @LưuVĩnhPhúc: ...what? Just because the answer (and reason) happens to be the same doesn't mean the question is a duplicate! Heck, the OP didn't even say anything about floating-point. How in the world would his question be a duplicate of a question about floating-point?! I wish I could downvote comments... – user541686 Aug 14 '15 at 08:12
  • 2
    @Mehrdad: If the answer and reason is the same then it's a duplicate even though the question is worded differently. The concept of duplicates is so that questions that are worded differently but have the same answer should point to a single answer. – slebetman Aug 14 '15 at 08:28
  • 2
    @slebetman: I don't think you understand what closing as "duplicate" means? The description under the close reason literally says, *"This question has been asked before*". The question you linked to is **not** just a "different wording" of this question. It's an entirely different question, about a very different language, that merely happens to have the same answer. If the *answer* is a duplicate it does not logically follow that "the question has been asked before". – user541686 Aug 14 '15 at 08:35
  • 2
    @Mehrdad: That's merely an A/B problem. This question (why floating point additions are not associative) has been asked before. – slebetman Aug 14 '15 at 09:13
  • 1
    @Mehrdad: it has been discussed multiple times on Meta that if the answers are duplicate then the question is as well. That's the whole point of duplicates, actually: to have multiple different questions point to the same answer. Having multiple questions means that it will be easier to find, because different people formulate the question differently or the question shows up in different contexts (e.g. in this example C and C#). Having a single answer means that effort is focused in a single place and not scattered around the system. – Jörg W Mittag Aug 14 '15 at 10:28
  • 1
    @Mehrdad With infinite digits I meant non-fixed digits... If you have 8 bits for the denominator, so 0-255 (unsigned), then `1/2 + 1/255` can't be represented exactly (the new denominator would be 510, that is outside the range)... You have to round it in some way. I'm not sure if this rounding (depending on how it is done) can/will make it non-associative. – xanatos Aug 14 '15 at 10:29
  • @JörgWMittag: I see. Would you have a link to the Meta post? – user541686 Aug 14 '15 at 10:30
  • @xanatos: If by "infinite" you meant "unbounded" then I have to ask: did you [read my 2nd comment](http://stackoverflow.com/questions/32004190/c-sharp-associativity-math-a-b-c-a-b-c?noredirect=1#comment51912729_32004190)? I was trying to explain to slebetman exactly the same thing that you're now trying to convey to me lol. – user541686 Aug 14 '15 at 10:32
  • 2
    @Mehrdad No, I hadn't... We say the same thing... – xanatos Aug 14 '15 at 10:33

7 Answers7

78

On the range of the double type:

double dbl1 = (double.MinValue + double.MaxValue) + double.MaxValue;
double dbl2 = double.MinValue + (double.MaxValue + double.MaxValue);

The first one is double.MaxValue, the second one is double.Infinity

On the precision of the double type:

double dbl1 = (double.MinValue + double.MaxValue) + double.Epsilon;
double dbl2 = double.MinValue + (double.MaxValue + double.Epsilon);

Now dbl1 == double.Epsilon, while dbl2 == 0.

And on literally reading the question :-)

In checked mode:

checked
{
    int i1 = (int.MinValue + int.MaxValue) + int.MaxValue;
}

i1 is int.MaxValue

checked
{
    int temp = int.MaxValue;
    int i2 = int.MinValue + (temp + temp);
}

(note the use of the temp variable, otherwise the compiler will give an error directly... Technically even this would be a different result :-) Compiles correctly vs doesn't compile)

this throws an OverflowException... The results are different :-) (int.MaxValue vs Exception)

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • 5
    A very good answer; however it addresses a limitation of the _range_ of the datatype, while I would guess that the question is more related to the _accuracy_ of the datatype within its range. – Codor Aug 14 '15 at 07:12
  • 3
    @Codor Added an example on precision. – xanatos Aug 14 '15 at 07:15
  • Perhaps the reason that this is so hard to understand is that the associative and other properties of numbers is theoretical. They apply ONLY when the range is UNBOUNDED, whether talking about integers or real numbers. So if we were to use every atom in the universe to represent a decimal digit, for example, we would be working with a bounded system of numbers, for which the associative and similar properties would not apply. – Fred Mitchell Aug 18 '15 at 23:15
14

one example

a = 1e-30
b = 1e+30
c = -1e+30
Luka Rahne
  • 10,336
  • 3
  • 34
  • 56
  • 10
    another more surprising example is `(0.1+0.2)+0.3` and `0.1+(0.2+0.3)` http://www.walkingrandomly.com/?p=5380 – phuclv Aug 14 '15 at 08:17
  • 3
    @Lưu Vĩnh Phúc: This kind of precision-based errors are implementation-dependent. You may get different results on Win7 running VS and on Linux running Mono, for instance. You may even get no error at all. – legrojan Aug 14 '15 at 11:09
  • @legrojan unless double precison math is done internally with higher precision – phuclv Aug 14 '15 at 11:39
  • strictly speaking kind of example I have presented is also implementation defined. There could be some implementation where example would hold - for example using Rational numbers as type. – Luka Rahne Aug 14 '15 at 12:59
8

Extending on the other answers which show how with extremes of small and large numbers you get a different result, here's an example where floating point with realistic normal numbers gives you a different answer.

In this case, instead of using numbers at the extreme limits of precision I simply do a lot of additions. The difference is between doing (((...(((a+b)+c)+d)+e)... or ...(((a+b)+(c+d))+((e+f)+(g+h)))+...

I'm using python here, but you will probably get the same results if you write this in C#. First create a list of a million values, all of which are 0.1. Add them up from the left and you see the rounding errors become significant:

>>> numbers = [0.1]*1000000
>>> sum(numbers)
100000.00000133288

Now add them again, but this time add them in pairs (there are much more efficient ways to do this that use less intermediate storage, but I kept the implementation simple here):

>>> def pair_sum(numbers):
    if len(numbers)==1:
        return numbers[0]
    if len(numbers)%2:
        numbers.append(0)
    return pair_sum([a+b for a,b in zip(numbers[::2], numbers[1::2])])

>>> pair_sum(numbers)
100000.0

This time any rounding errors are minimised.

Edit for completeness, here's a more efficient but less easy to follow implementation of a pairwise sum. It gives the same answer as the pair_sum() above:

def pair_sum(seq):
    tmp = []
    for i,v in enumerate(seq):
        if i&1:
            tmp[-1] = tmp[-1] + v
            i = i + 1
            n = i & -i
            while n > 2:
                t = tmp.pop(-1)
                tmp[-1] = tmp[-1] + t
                n >>= 1
        else:
            tmp.append(v)
    while len(tmp) > 1:
        t = tmp.pop(-1)
        tmp[-1] = tmp[-1] + t
    return tmp[0]

And here's the simple pair_sum written in C#:

using System;
using System.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static double pair_sum(double[] numbers)
        {
            if (numbers.Length==1)
            {
                return numbers[0];
            }
            var new_numbers = new double[(numbers.Length + 1) / 2];
            for (var i = 0; i < numbers.Length - 1; i += 2) {
                new_numbers[i / 2] = numbers[i] + numbers[i + 1];
            }
            if (numbers.Length%2 != 0)
            {
                new_numbers[new_numbers.Length - 1] = numbers[numbers.Length-1];
            }
            return pair_sum(new_numbers);
        }
        static void Main(string[] args)
        {
            var numbers = new double[1000000];
            for (var i = 0; i < numbers.Length; i++) numbers[i] = 0.1;
            Console.WriteLine(numbers.Sum());
            Console.WriteLine(pair_sum(numbers));
        }
    }
}

with output:

100000.000001333
100000
Duncan
  • 92,073
  • 11
  • 122
  • 156
4

This stems from the fact that ordinary value types (int, long, etc.) are stored using a fixed amount of bytes. Overflow is thus possible, when the sum of two values exceeds the byte storage capacity.

In C#, one may use BigInteger to avoid this kind of issue. BigInteger's are arbitrary in size and therefore do not create overflows.

BigInteger is only available from .NET 4.0 and above (VS 2010+).

legrojan
  • 569
  • 1
  • 9
  • 25
  • 2
    integer addition in finite field (for example 32 bit arithmetic) is associative – Luka Rahne Aug 14 '15 at 11:45
  • 1
    @LukaRahne Not if at least one of the individual sums exceeds the 32 (as per your example) bit limit. In that case, as xanatos showed above, one gets into trouble. – legrojan Aug 14 '15 at 12:12
  • 4
    that happens in case that one conducts `checked` operations. By default operations are not checked against overflows and in such case associative law holds. – Luka Rahne Aug 14 '15 at 12:44
0

Short answer is (a + b) + c == a + (b + c) mathematically, but not necessarily computationally.

Remembering that computers really work in binary, even simple decimals can result in roundoff errors when converted to internal format.

Depending on the language, even addition can incur roundoff errors, and in the above example, the roundoff error in a+b may differ from that in b+c.

One surprising offender is JavaScript: 0.1 + 0.2 != 0.3. The roundoff error is a long way down the decimal, but real and problematic.

It’s a general principal that you reduce roundoff error by adding the small parts first. This way they can accumulate before being overwhelmed by the larger number.

Manngo
  • 14,066
  • 10
  • 88
  • 110
0

A couple of similar examples:

static void A(string s, int i, int j)
{
  var test1 = (s + i) + j;
  var test2 = s + (i + j);
  var testX = s + i + j;
}

Here A("Hello", 3, 5) leads to test1 and testX being equal to "Hello35", while test2 will be "Hello8".

And:

static void B(int i, int j, long k)
{
  var test1 = (i + j) + k;
  var test2 = i + (j + k);
  var testX = i + j + k;
}

Here B(2000000000, 2000000000, 42L) leads to test1 and testX being equal to -294967254L in usual unchecked mode, while test2 becomes 4000000042L.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
0

On the same blog in comments i found this, what is the meaning of thid "^" symbol in C#

a = 10 ^ x

b = -a

c = 5

(a+b) + c == 5

a + (b+c) == 0

Vipresh
  • 1,250
  • 15
  • 31