Expression evaluation in C vs Java

Question

int y=3;
int z=(--y) + (y=10);

when executed in C language the value of z evaluates to 20 but when the same expression in java, when executed gives the z value as 12.

Can anyone explain why this is happening and what is the difference?

I feel like there could be a misunderstanding here, what do you expect `(y=10)` to do? — Jeppz, Aug 30 '20 at 09:44
In C language, the evaluation order of `--y` and `y=10` is undefined. The result is undefined. — , Aug 30 '20 at 09:59

klutt · Answer 1 · 2020-08-30T13:47:55.490

when executed in C language the value of z evaluates to 20

No it does not. This is undefined behavior, so z could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: Undefined, unspecified and implementation-defined behavior

As a rule of thumb, never modify a variable twice in the same expression.

It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. Why are these constructs using pre and post-increment undefined behavior?

In C, when it comes to arithmetic operators, like + and /, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:

int foo(void)
{
    printf("foo()\n");
    return 0;
}

int bar(void)
{
    printf("bar()\n");
    return 0;
}

int main(void)
{
    int x = foo() + bar();
}

What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, Is it undefined behavior to use functions with side effects in an unspecified order? , about that, so I'll update this answer later.

Some other variables have specified order (left to right) of evaluation, like || and && and this feature is used for short circuiting. For instance, if we use the above example functions and use foo() && bar(), only the foo() function will be executed.

I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer

isn't it sequence point related undefined behaviour? Order of evaluation will not matter here (when considering the UB) — 0___________, Aug 30 '20 at 09:38
@P__J__ Yes and no. Even without the sequence point thing, the unspecified order of evaluation could still explain the behavior of OPs code. — klutt, Aug 30 '20 at 09:43
This is a fantastic answer except it's missing its second part! You should at least mention that in java it is entirely specified behaviour, and 12 is printed every time, on any java version, using any compiler, or any VM impl - perhaps mention that unlike C, java has virtually no unspecified behaviour, except if you modify and read the same variable from multiple threads without doing so properly. — rzwitserloot, Aug 30 '20 at 10:21

score 3 · Accepted Answer · answered Aug 30 '20 at 10:34

There are 3 parts to this answer:

How this works in C (unspecified behaviour)
How this works in Java (the spec is clear on how this should be evaluated)
Why is there a difference.

For #1, you should read @klutt's fantastic answer.

For #2 and #3, you should read this answer.

How does it work in java?

Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.

The java spec clearly says that x+y is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10 is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10 is evaluated which is clearly 12.

Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?

The answer is: performance.

In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.

In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.

If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.

So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.

In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe does it to class files; javac.exe is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.

So java code is never undefined behaviour?

Not so. Java has a memory model which includes a ton of undefined behaviour:

class X { int a, b; }
X instance = new X();

new Thread() { public void run() {
    int a = instance.a;
    int b = instance.b;
    instance.a = 5;
    instance.b = 6;
    System.out.print(a);
    System.out.print(b);
}}.start();

new Thread() { public void run() {
    int a = instance.a;
    int b = instance.b;
    instance.a = 1;
    instance.b = 2;
    System.out.print(a);
    System.out.print(b);
}}.start();

is undefined in java. It may print 0056, 0012, 0010, 0002, 5600, 0600, and many many more possibilities. Something like 5000 (which it could legally print) is hard to imagine: How can the read of a 'work' but the read of b then fail?

For the exact same reason your C code produces arbitrary answers:

Optimization.

The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using e.g. synchronized.

Does "undefined behavior" mean the same thing in Java? In C, it means that anything may happen, since the standard puts absolutely zero requirements on what should happen. So in your case it could - at least theoretically - print any of those options you listed, or print something completely different like "Hello World" or the whole bible, or format your hard drive or anything else. — klutt, Aug 30 '20 at 10:50
Mostly, the same, yes. C compilers don't actually print the bible either. Printing `5000` is utterly bizarre, but not just theoretically legit according to the spec, it can and does happen on many OS/CPU combos. — rzwitserloot, Aug 30 '20 at 10:53
@klutt no it doesn't. Mostly something like "indeterminate values". — Antti Haapala -- Слава Україні, Aug 30 '20 at 13:18
@klutt to be more specific, java has undefined behaviour, but mostly in API calls. If you call .put(x, y) from 2 threads simultaneously on a map that is not explicitly specced to deal with that (e.g. a plain jane HashMap or a LinkedHashMap) you get the C style undefined behaviour: Your HashMap might start playing yankee doodle dandee, that'd fit the spec. The JMM is actually a bit more specific on what can happen without comes-before relationships, which makes that less 'undefined behaviour' and more 'schroedingers cat' (the cat may be alive or dead, but not anything else). — rzwitserloot, Aug 30 '20 at 13:36

0___________ · Answer 3 · 2020-08-30T10:22:27.770

When executed in C language the value of z evaluates to 20

It is not the truth. The compiler you use evaluates it to 20. Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh

This kind of behaviour is called Undefined Behaviour.

In your expression you have two problems.

Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
In this expression there is also problem with the sequence point (Undefined Bahaviour)

Expression evaluation in C vs Java

3 Answers3

How does it work in java?

So java code is never undefined behaviour?