When does knowingly utilizing unspecified behaviour provide benefit without sacrificing correctness?

Question

I'll start with the background on the terms being used.

Correct

"In theoretical computer science, correctness of an algorithm is asserted when it is said that the algorithm is correct with respect to a specification." - the correctness tag on stack overflow.

Undefined

Undefined behaviour is where anything is allowed to happen. In effect, the possibilities of what can happen are infinite. Examples are dereferencing a nullptr in c++, and dividing by zero.

Well-defined

This is where there is one and only one result possible.

Implementation defined

This is where an implementation changes the number of possibilities. If something being implementation defined results in a definition equivalent to undefined or well-defined, then it is not what I'm referring to.

Unspecified

This is where there is greater than one, but less than an infinite number of possibilities.

Utilizing unspecified behaviour

What I'm referring to is the idea of opening up your program to unspecified behaviour for some benefit such as (but certainly not limited to) performance, or correctness of the program. As an example, turning a single-threaded program into a multi-threaded one would likely "utilize" unspecified behaviour for benefit.

The idea

This is where things get interesting. Or, at least the potential is there. This is the grey area of programming. There is some idea of what can or can't happen, but what happens is neither well-defined nor undefined.

A simple, and well used example of this would involve multi-threading. There is a lot left to be unknown and unknowable when working with multiple threads. Multiple threads are used in spite of this because it brings with it potential for major performance increases that would otherwise be unavailable.

Where else does knowingly implementing unspecified behaviour provide benefit without sacrificing correctness? There needs to be some benefit.

For those of you that may want to close based the idea this will invite opinion based answers - I counter with saying that questions asking for answers based on expertise are an exception, and allowed. That is my basis for this question being valid, and acceptable. This is an answerable question.

In specific reference to my previous question along similar lines, I am re-asking because I didn't phrase the question properly. I have re-worded it, and am now specifically requiring an answer to show benefit without sacrificing the correctness of the program.

The only difference between implementation-defined and unspecified behaviour is that implementation-defined behaviour must be documented by the implementers. — Joseph Mansfield, Jul 19 '14 at 19:10
@JosephMansfield - disagree, implementation defined can result in well-defined (only one possibility), whereas unspecified by definition must have more than one possibility. — Michael Gazonda, Jul 19 '14 at 19:11
You can't disagree with the definition. Implementation-defined: "behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents". Unspecified: "behavior, for a well-formed program construct and correct data, that depends on the implementation" — Joseph Mansfield, Jul 19 '14 at 19:13
So really, implementation-defined behaviour is a subset of unspecified behaviour. There are certainly plenty of places where something is left implementation-defined because doing so is beneficial. — Joseph Mansfield, Jul 19 '14 at 19:14
Here's my take - "Similarly, the C Standard defines it as behavior for which the standard "provides two or more possibilities and imposes no further requirements on which is chosen in any instance".[5] The C++ Standard also notes that the range of possible behaviors is usually provided." — Michael Gazonda, Jul 19 '14 at 19:17
**I hope that my intent was clear, and that we don't need to bicker about the definitions here.** I'm specifically referring to _not_ implementation specific behaviour where it is well defined. I'm specifically referring to spots where there are multiple possibilities such as what happens in multi-threading. I did my best to bring in accurate definitions, and I hope the question is clear. — Michael Gazonda, Jul 19 '14 at 19:18
BTW: Here is an example of a supposed benefit, which turned out to sacrifice correctness after all: http://kqueue.org/blog/2012/06/25/more-randomness-or-less/ — Ben Voigt, Jul 19 '14 at 19:47

score 3 · Answer 1 · answered Jul 19 '14 at 20:30

3

Here's an article showing how to design an algorithm tolerant of indeterminate values that exist in an uninitialized buffer, thus avoiding iterating through the buffer to initialize it (and all the corresponding cache misses).

Note that in C++, using an indeterminate value is unspecified behavior only for unsigned char; for all other types it would be undefined behavior. See section 8.5 of the Standard for more information.

answered Jul 19 '14 at 20:30

Ben Voigt

277,958
43
419
720

I feel so much better now, as this is actually an answer to what I was *trying* to ask. Thank you. – Michael Gazonda Jul 19 '14 at 20:36
@MGaz: Unfortunately, I'm not convinced that the trick shown is actually allowed in C++. Pointer indexing will promote any `unsigned char` value to an `int`, which falls outside the limited permission to use indeterminate values. – Ben Voigt Jul 19 '14 at 20:38
@BenVoigt: If `unsigned char` is 8 bits, then integer promotion on an Indeterminate Value of type `unsigned char` will yield an Unspecified Value which is guaranteed to be a number in the range 0 to 255 and not a trap representation. – supercat May 05 '15 at 17:47

score 2 · Answer 2 · answered Jul 19 '14 at 19:27

2

The unspecified behavior is not for the programmer to use (for the contrary if, for the programmer to known what it need to look with care and avoid), the unspecified behavior, as all the others: undefined, implementation defined, etc... are for the compiler writer to take advantage of (for example Order of evaluation of subexpressions have unspecified behavior, for that you should not pass any subexpression with side-effects in the evaluations, and assuming that you take care of this, the compiler reorder the evaluation of subexpression as more efficient it can, some subexpression can contain same calculation that can be reused and many other optimization can take advantage of the knowledge that can do evaluated in the best order it can find.

answered Jul 19 '14 at 19:27

NetVipeC

4,402
1
17
19

I added a definition of what I'm referring to by utilizing unspecified behaviour, and give an example of what I mean. Sorry for the lack of clarity. – Michael Gazonda Jul 19 '14 at 20:06
@MGaz: Faster execution because the compiler made better use of the CPU pipeline *is* a benefit, isn't it? – Ben Voigt Jul 19 '14 at 20:13
@BenVoigt - Yes, of course :p Ugh. Was responding to "The unspecified behavior is not for the programmer to use", so wanted to clarify what I meant by the programmer "using" unspecified behaviour. This is a surprisingly difficult question for me to put into words properly. – Michael Gazonda Jul 19 '14 at 20:20

score 1 · Answer 3 · answered Jul 19 '14 at 19:25

1

The only case that leads to a program with deterministic behavior is the case when no matter which of the possibilities the implementation uses, the result is the same.

A trivial example of this is the order in which the arguments of a function call are evaluated. If no matter which way the arguments are evaluated, the resulting list of arguments is the same, then the unspecified behavior of the order does not matter.

answered Jul 19 '14 at 19:25

jxh

69,070
8
110
193

the numbers of argument is the same, but the values can change, if the subexpression has side effects (for example: you have `f(obja.a(), obja.a(), obja.a())` where _a_ is a function that increment an int member variable and returned, you would end calling to _f_ with 3 parameter but what would their values be? this must certain is not the programmers intention. – NetVipeC Jul 19 '14 at 19:32
@NetVipeC: "If no matter which way the arguments are evaluated, ...", like if `a()` always returned the same thing. – jxh Jul 19 '14 at 19:33
sorry, missed, yes, this is the part of taking care and knowing this kind of behavior. – NetVipeC Jul 19 '14 at 19:36
There are many scenarios in which a program is not required to produce a specific output, but rather to arbitrarily produce any one of many possible outputs meeting certain criteria. Having the output of such a program be affected by Unspecified behaviors would generally only be a problem if some of the outputs that could result from such behaviors would fail to meet the appropriate criteria. – supercat May 05 '15 at 17:51
@supercat: Please expand my definition of "the result is the same" to mean "the result satisfies the same correctness criteria". – jxh May 05 '15 at 20:53
@jxh: I'd say "all allowable variations in Unspecified Behavior will yield results that meet requirements". There are some cases where reproducibility is required, in which case a program which always outputs "23" or always outputs "24" might be acceptable, but a program which outputs "23" or "24" based upon Unspecified Behavior would not be. – supercat May 05 '15 at 21:33
@supercat: The nuance is too subtle for me. The word "deterministic" is meant to address reproducibility, and the word "result" is meant to address an acceptable outcome. – jxh May 05 '15 at 22:24

score 0 · Answer 4 · edited May 23 '17 at 11:46

0

I have made a "very useful" program that tells you whether the first or second operand of + in the expression foo() + bar() is evaluated first when you run the program:

#include <iostream>

bool done = false;

int foo() {
    if (!done) {
        std::cout << "First operand evaluated first" << std::endl;
        done = true;
    }
    return 0;
}

int bar() {
    if (!done) {
        std::cout << "Second operand evaluated first" << std::endl;
        done = true;
    }
    return 0;
}

int main()
{
    foo() + bar();
}

So sure, you can utilise unspecified behaviour, but you probably don't want to. In a typical program, you want your unspecified behaviour to either resolve to a single execution path or to allow multiple execution paths that are equally valid for your program. This was described in the question you linked to.

Regardless, all unspecified behaviour provides benefit. The benefit is to the implementation (and transitively to the developer and the user) because it gives it the freedom to perform optimisations (sometimes specific to a particular machine).

edited May 23 '17 at 11:46

Community

1
1

answered Jul 19 '14 at 19:26

Joseph Mansfield

108,238
20
242
324

This is a good example of what's different between this question, and the other one you're referring to. This is not an answer to the question here, as you have not shown how there is any benefit to whether or not the first or second operand is called first. – Michael Gazonda Jul 19 '14 at 19:29
@MGaz The benefit is that my program does its job correctly. Isn't "benefit" entirely dependent on what you're trying to do? – Joseph Mansfield Jul 19 '14 at 19:30
The implication of "not sacrificing correctness" is that the program is already correct, and so being correct wouldn't be a benefit. At the very least, for correctness to be a benefit then it would require *more* correctness than was there previously. – Michael Gazonda Jul 19 '14 at 19:42

score 0 · Answer 5 · edited May 23 '17 at 12:09

0

I'm not a smart rocket scientist. To me your question

1) ..does not make sense for C++

In the traditional programming that most of the readers of Stack Overflow do on the daily basis we just need the things to be defined and behave in a deterministic and predictable way resulting in software that works - http://en.wikipedia.org/wiki/Deterministic_Turing_machine. And if it worked today we want it also to work tomorrow

C++ falls into that category as well

In such a traditional world I don't see any way how an answer to your question might be actually useful and thus how this question may be useful

2) ..makes some sense in hardware oriented languages

In the VHDL programming language programmers use(work with) special variable states (see What is the purpose of the `std_logic` enumerated type in VHDL?) representing a signal value that is not known as the wire carrying the signal is not connected. Cut. Unplugged. White noise etc.

3) ..makes sense outside the traditional CPU world

There is a whole computing universe where the terms undefined and unspecified etc. totally come into play. And it is the http://en.wikipedia.org/wiki/Quantum_computer.

Google invests in the quantum computing research and THE company that works on it is D-Wave Systems Inc.

But this is really a paradigm shift (similar to programming of Nano computers)

So my-average-IQ answer is either make your question useful for C++ or move it to more appropriate Stack Exchange site (e.g. https://cs.stackexchange.com/) or quench your thirst outside the C++ programming paradigm

edited May 23 '17 at 12:09

Community

1
1

answered Jul 20 '14 at 06:03

xmojmr

8,073
5
31
54

@BenVoigt yet it is only a fake randomness http://en.wikipedia.org/wiki/Random_number_generation#.22True.22_random_numbers_vs._pseudo-random_numbers which is actually not "correct" result. It gives just good enough results but not something that would satisfy the "science" part of the "theoretical computer science" term. I still don't see any real-life meaning of any answer to the PO question (if it is what you tried to point out) – xmojmr Jul 20 '14 at 06:28
Sometimes there is a source of true entropy, sometimes not. Doesn't matter. Any RNG or PRNG is a counter-example to your answer's foundation, which is that all calculations in all C++ software should be "deterministic and predictable". In fact, security depends on having keys and nonces that are unpredictable. – Ben Voigt Jul 20 '14 at 07:12
If you read my comment under the question, you would see that people have tried to use uninitialized memory as an additional source of randomness for security purposes. That didn't work out so well (which is why it is a comment, not an answer) – Ben Voigt Jul 20 '14 at 07:13
1

But there are plenty of other algorithms, useful ones, that depend on non-determinism. For example, genetic algorithms perform optimization using stochastic search. Fuzzers generate random inputs to detect improper validation of data. Many others, too. – Ben Voigt Jul 20 '14 at 07:15
@BenVoigt thanks for explanation **1** I do not find PO's question "useful and clear" in the C++ tag **2** I read and voted up your answer and comments. **3** I tried to understand from [PO's profile](http://stackoverflow.com/users/3561240/mgaz) what "grey area of programming..amazing challenge..love it" does he mean and I tried to give an useful answer **4** you're right that my "foundation" for answer 1) is not perfect. I'm not a high-IQ rocket scientist, I'm not even an University-Certified Software Engineer, I'm just a programmer, I did not understand purpose of the question and I admit it – xmojmr Jul 20 '14 at 07:34
Even when we "want" our programs to work in a deterministic way, this often doesn't happen. All security related issues in programming are examples of not handling unspecified behaviour properly - which may have been assumed to have been well defined. In essence, there is much more going on that is unspecified than it is likely to appear there is. *User input of any kind is unspecified.* – Michael Gazonda Jul 20 '14 at 11:03
As far as the "usefulness" of this question goes, that is up to the community. I believe there is a lot of usefulness in better understanding what is deterministic, and what isn't. I think this is something that is perceived poorly at this time. – Michael Gazonda Jul 20 '14 at 11:08
@MGaz what is your expected outcome for your C++ code once you get better understanding of what is "undefined", "unspecified", "correct", "deterministic" from the answers? Did you find anything relevant to your question in my points #2 and #3? – xmojmr Jul 20 '14 at 16:56
@xmojmr - don't have an "expected outcome" :p as far as your points go, it's hard for me to get to #2 and 3 when I disagree with #1. – Michael Gazonda Jul 20 '14 at 17:42
@MGaz if your question does not have any useful impact on your C++ code (and even you don't see such value) and it is hard for you to read http://www.dwavesys.com/tutorials/background-reading-series/quantum-computing-primer because someone does not thing that C++ is the only language that can speak truth about the universe then it is hard for me to provide useful answer. Voting -1 as your question does not look like on-topic programming question. Sorry. I tried to share my "expertise" with you – xmojmr Jul 20 '14 at 18:02
@xmojmr No useful impact on my C++ code? I didn't say that... I hope very much to see answers that will positively impact my code. That's different than an expected outcome. – Michael Gazonda Jul 20 '14 at 18:12

score 0 · Answer 6 · answered Jul 27 '15 at 21:52

It's unclear which of two things you are asking:

In situations where the Standard indicates that a compiler may select from among several possible behaviors in arbitrary fashion, but where some particular compiler always uses the same alternative, can user code gainfully exploit such a compiler's consist, though Unspecified, behavior?
In situations where the Standard allows a compiler to select from among several possible behaviors, and where different behaviors would have different results, may a program whose output would be affected by Unspecified Behavior be considered "correct" if any possible output produced by such behavior would meet requirements?

I would strongly caution against reliance upon compiler discretion with regard to the sequence in which different parts of an expression are evaluated. If a loop gets unrolled it's entirely plausible that some statements within the loop might sometimes be evaluated one way and sometimes another. While there are times when it might be convenient to be able to say int x = getc()+256*getc(); the lack of any guarantee of consistency with regard to whether the first or second character will get multiplied makes such constructs pretty useless.

With regard to the second question, the answer should depend upon whether consistency of behavior is among a program's requirements. If a program would be required to produce output meeting certain criteria, and it would be acceptable if every run produced arbitrarily identical or different input, provided that every run produced output that met the criteria, then it would be perfectly legitimate for the exact choice of output to depend upon how the compiler implemented some Unspecified Behavior. In some cases, however, there may a requirement that multiple runs of a program with the same input must yield the same output, even if the requirements were silent on some details of what that output must be. Essentially, each time the program is run with new input would define a new requirement. In those cases, having output dependent upon Unspecified Behavior would be inadvisable.

When does knowingly utilizing unspecified behaviour provide benefit without sacrificing correctness?

I'll start with the background on the terms being used.

Correct

Undefined

Well-defined

Implementation defined

Unspecified

Utilizing unspecified behaviour

The idea

6 Answers6

1) ..does not make sense for C++

2) ..makes some sense in hardware oriented languages

3) ..makes sense outside the traditional CPU world