6

They say that when having UB, a program may do whatever it wants.

But if I have UB in one statement, such as

signed char a = 0x40;
a <<= 2;

or maybe even an unused(!) zero-size variable length array:

int l = 0;
char data[l];

is this in any way tolerable as only the result is undefined, or is this "bad" nevertheless?

I am especially interested in situations like these:

signed char a = 0x40;
a <<= 2;
switch (state) {
    case X: return
    case Y: do something with a; break;
    case Z: do something else with a; break;
}

Assume that case X covers the case where the value of a is undefined, and the other cases make use of this case. It would simplify things if I was allowed to calculate this the way I want and make the distinction later.

Another situation is the one I talked about the other day:

void send_stuff()
{
    char data[4 * !!flag1 + 2 * !!flag2];
    uint8_t cursor = 0;
    if (flag1) {
        // fill 4 bytes of data into &data[cursor]
        cursor += 4;
    }
    if (flag2) {
        // fill 2 bytes of data into &data[cursor]
        cursor += 2;
    }
    for (int i=0; i < cursor; i++) {
        send_byte(data[i])
    }
}

If both flags are unset, I have the "undefined" array data with length 0. But as I don't read from nor write to it, I don't see why and how it can possibly hurt...

Community
  • 1
  • 1
glglgl
  • 89,107
  • 13
  • 149
  • 217
  • 3
    To clarify: are you asking if there is a difference between a result being undefined or the behaviour being undefined? – Bathsheba Jun 12 '14 at 07:48
  • 2
    As far as I'm concerned, doing a division without testing the divisor is not 0 is in now way tolerable... – Laurent S. Jun 12 '14 at 07:49
  • @Bartdude Ok, I removed the `/ 0` example. (But even then - if I don't care about the result, I don't see a problem...) – glglgl Jun 12 '14 at 07:51
  • @Bathsheba This is one aspect of this... – glglgl Jun 12 '14 at 07:52
  • @glgl on most systems (all?) dividing by 0 may generate an interrupt at processor level that propagates up to the application. It's not just that the result is undefined. – Remo.D Jun 12 '14 at 07:56
  • Good brain-food. As near as I can see in the standard, the phrasing is fairly boilerplate concerning undefined *results* (i.e. all those places where you see "the results are undefined", etc. If those results are never evaluated or relied on, rather simply disappearing into the ether, so would, I suspect, that undefined condition. I see nowhere in the standard, however, where *invoking* undefined behavior, not just obtaining undefined results, has any level of acceptable tolerance (trap conditions not withstanding). – WhozCraig Jun 12 '14 at 07:57
  • @WhozCraig Thank you. I was never aware that the standard has a distinction between "undefined results" and "invoking undefined behaviour". I think that is the key point... – glglgl Jun 12 '14 at 07:58
  • @glglgl just my take. I'm sure there are plenty others out there. – WhozCraig Jun 12 '14 at 08:01
  • @glglgl: I couldn't resist tidying up the question a little. Do roll back if you don't like it and I'll get back in my box. – Bathsheba Jun 12 '14 at 08:02
  • @Bathsheba (Nearly) perfectly fine, thank you. :-) (I just changed "an" to "a"...) – glglgl Jun 12 '14 at 08:02
  • 2
    `a <<= 2;` is not undefined behavior on most architectures. It can be expanded to `a = (signed char)(((int)a)<<2);`. The conversion is implementation-defined (or raises an impl…) – Pascal Cuoq Jun 12 '14 at 08:06
  • 1
    Is this question different from http://stackoverflow.com/questions/18385020/can-code-that-will-never-be-executed-invoke-undefined-behavior ? It looks like a duplicate to me. – Pascal Cuoq Jun 12 '14 at 08:07
  • @PascalCuoq No, mine goes further. In my case, I want to allow the code to be executed, but to "ignore" the result. But its answers are very valuable. Thank you for pointing me to it. – glglgl Jun 12 '14 at 08:08
  • 2
    @glglgl This is exactly what you cannot do with undefined behavior. For instance an uninitialized variable does not have “some value”. Accessing it is undefined behavior (arguably) and the compiler may have assumed that the code branch that was accessing it was unreachable. See the last example of http://blog.frama-c.com/index.php?post/2013/03/13/indeterminate-undefined where multiplying an “unknown” value by 2 modulo 2^32 produces an odd result. – Pascal Cuoq Jun 12 '14 at 08:11
  • @PascalCuoq But aside from the result, nothing "bad" happens (such as breaking the program). – glglgl Jun 12 '14 at 08:13
  • @glglgl Just add `if (j % 2 != 0) *(char*)0 = 1;` at the end of the program. Your question relates to the question of requiring static analyzers to find the second undefined behavior along an execution path. I am strongly against this idea when asked in that context. You may be interested in some of the arguments made although the context is different (https://sites.google.com/a/cost-ic0701.org/compare2012/home2/COMPARE2012.pdf?attredirects=0&d=1 , also another article of people in the field who are also opposed to it). – Pascal Cuoq Jun 12 '14 at 08:19
  • @glglgl `int a; int b=a; /*never use b or a from here*/` is UB. By the looks of it, it can't possibly cause anything "bad" (except trap values). Yet it's UB. But if a compiler detects it and decides to do something outrageous (such as reboot), it's free to do so :-) – P.P Jun 12 '14 at 08:20
  • 1
    @glglgl: Imagine an architecture where an integer overflow would trigger a CPU interrupt that terminates the process. Imagine a compiler that trashes the stack when you attempt to define an array of length zero. The standard allows them to, because it does not *define* the required behaviour in those cases. While the compiler could conceivably be better, it's your code that broke, not the compiler. – DevSolar Jun 12 '14 at 08:21
  • As Pascal already stated `signed char a = 0x40; a <<= 2;` is not UB, so the premises are wrong. – ouah Jun 12 '14 at 08:57
  • Why don't you just use `unsigned char` instead of `signed char`? All arithmetic is well defined for unsigned types. – cmaster - reinstate monica Jun 13 '14 at 08:13
  • @cmaster That would circumvent the point of the question. It is not about this statement per se; it was meant as en example for a statement which produces an undefined result, but whose undefinedness might only affect the result itself and nothing else. – glglgl Jun 13 '14 at 08:20

3 Answers3

4

Undefined behaviour means that it isn't defined by the C specification. It may very well be defined (or partially defined) for a specific compiler.

Most compilers define a behavior for unsigned shift.

Most compilers define whether zero-length arrays are allowed.

Sometimes you can change the bahaviour with compiler flags, like --pedantic or flags that treat all warnings as errors.

So the answer to your question is:

That depends on the compiler. You need to check the documentation for your particular compiler.

Is it OK to rely on a specific result when you use something that is UB according to the C standard?

That depends on what you are coding. If it is code for a specific embedded system where the likelyhood of ever porting to anywhere else is low, then by all means, rely on UB if it gives a big return. But best practice is to avoid UB when possible.

Edit:

is this in any way tolerable as only the result is undefined, or is this "bad" nevertheless?

Yes (only the result is undefined is true in practice, but in theory, the compiler manufacturer can terminate the program without breaking the C spec) and yes, it is bad nevertheless (because it requires additional tests to ensure that the behaviour remains the same after a change is made).

If the behaviour is unspecified, then you can observe what behaviour you get. Best is if you check the assembly code generated.

You need to be aware that the behaviour can change if you change anything, though. Changes that may change the behaviour include, but is not limited to, changes to the optimization level, and the application of compiler upgrades or patches.

The people who write the compilers are generally rational people which means that in most cases the program will behave in the way that was easiest for the compiler developer.

Best practice is still to avoid UB when possible.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82
  • The sentences “That depends on the compiler. You need to check the documentation for your particular compiler.” apply any time UB is involved. You are not answering the question, just sidestepping it. If that makes it easier, assume that the reader of your answer **has** read the compiler's documentation, and that there is no particular mention of the undefined behavior under consideration (which means that the compiler is allowed to fly demons our of the programmer's nose, or not—this is the question). – Pascal Cuoq Jun 12 '14 at 14:24
  • Are you actually saying it is ok to have an UB somewhere in the code, and leave it there, because it is "working"? – BЈовић Jun 13 '14 at 08:01
  • 1
    @BЈовић I'm saying that it is bad to leave it there, but if you still insist that you need to do it, then at least add a test case so that you can detect if the behaviour changes. – Klas Lindbäck Jun 13 '14 at 13:40
1

You are confusing undefined and implementation defined behaviour.

Shifting a value by more bits than it has is implementation defined. It will have an effect and you need to read your compiler documentation. On some architectures, for instance, it it might have no effect, on others you'll be left with zero. Implementation defined behaviour isn't portable between architecture or compiler versions, requires no diagnostics, but will be consistent between runs.

However, declaring an array with a size of 0 is undefined behaviour. The compiler is free to make optimisations based on you not doing something like that, and produce code that doesn't work when you do. The compiler is free to do anything it likes if you invoke undefined behaviour, and it's possible your program will work today, and not tomorrow, or will work until you add another line somewhere else in the program or ....

Undefined means - not defined. There is no mileage in trying to work out how it's going to behave or depending on the results of said behavior.

Tom Tanner
  • 9,244
  • 3
  • 33
  • 61
1

In the context of your send_stuff function, the compiler is free to optimize your computation of cursor:

uint8_t cursor = flag1 ? 4 + 2 * !!flag2 : 2;

While this gives a different result for cursor when both flag1 and flag2 are 0, that's fine because that would result in undefined behavior anyway, so it's allowed to do whatever it wants in that case.

This is a completely machine-independent optimization, so even if you "know" that you'll always run on the same architecture with the same compiler, you can find that one day you make a seemingly unrelated change, the compiler gets tipped into a different decision about what optimal code looks like and your previously working code is suddenly behaving differently.

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118