Error that is neither syntactic nor semantic?

Question

I had this question on a homework assignment (don't worry, already done):

[Using your favorite imperative language, give an example of each of ...] An error that the compiler can neither catch nor easily generate code to catch (this should be a violation of the language definition, not just a program bug)

From "Programming Language Pragmatics" (3rd ed) Michael L. Scott

My answer, call main from main by passing in the same arguments (in C and Java), inspired by this. But I personally felt like that would just be a semantic error.

To me this question's asking how to producing an error that is neither syntactic nor semantic, and frankly, I can't really think of situation where it wouldn't fall in either.

Would it be code that is susceptible to exploitation, like buffer overflows (and maybe other exploitation I've never heard about)? Some sort of pit fall from the structure of the language (IDK, but lazy evaluation/weak type checking)? I'd like a simple example in Java/C++/C, but other examples are welcome.

This question appears to be off-topic because it belongs on http://programmers.stackexchange.com/ — DanMan, Jan 27 '14 at 18:53

Kninnug · Accepted Answer · 2014-01-24T01:47:39.883

8

Undefined behaviour springs to mind. A statement invoking UB is neither syntactically nor semantically incorrect, but rather the result of the code cannot be predicted and is considered erroneous.

An example of this would be (from the Wikipedia page) an attempt to modify a string-constant:

char * str = "Hello world!";
str[0] = 'h'; // undefined-behaviour here

Not all UB-statements are so easily identified though. Consider for example the possibility of signed-integer overflow in this case, if the user enters a number that is too big:

// get number from user
char input[100];
fgets(input, sizeof input, stdin);
int number = strtol(input, NULL, 10);
// print its square: possible integer-overflow if number * number > INT_MAX
printf("%i^2 = %i\n", number, number * number);

Here there may not necessarily be signed-integer overflow. And it is impossible to detect it at compile- or link-time since it involves user-input.

edited Jan 24 '14 at 01:47

answered Jan 24 '14 at 01:17

Kninnug

7,992
1
30
42

`char * str = "Hello world!";` should not compile in C++ (not sure about C) as the type of string literals is `const char *` (well actually it decays to that through array-to-pointer-conversions…) – MFH Jan 24 '14 at 08:46
@MFH In C, string literals do not have `const` type. (You're still not allowed to actually modify them, though.) – This isn't my real name Feb 06 '14 at 03:33
@MFH: in C++ the type of string literals is "array of `const char`", but in C it's "array of `char`". So in C++03 there's an implicit conversion from string literals to `char*`, for compatibility with C. This compatibility was deprecated in C++03 and removed in C++11. So, "should not compile in C++" is true if using C++11, which most people aren't. Or if using for example `-Wwrite-strings -Werror`, which most people aren't but maybe should be ;-) – Steve Jessop Feb 06 '14 at 09:37

haccks · Answer 2 · 2015-04-10T14:37:08.850

5

Statements invoking undefined behavior¹ are semantically as well as syntactically correct but make programs behave erratically.

a[i++] = i;   // Syntax (symbolic representation) and semantic (meaning) both are correct. But invokes UB.

Another example is using a pointer without initializing it.
Logical errors are also neither semantic nor syntactic.

_{1. Undefined behavior: Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.}

edited Apr 10 '15 at 14:37

answered Jan 24 '14 at 01:21

haccks

104,019
25
176
264

I see that this one is also in the Wikipedia page provide by @Kninnug, I haven't thought of that simple of a statement having UB. – SGM1 Jan 24 '14 at 01:38
1

@SGM1 An even simpler one would be `i = i++;` :) – Kninnug Jan 24 '14 at 01:54
The compiler *can* catch this example, it's just that generally they don't bother. So there's some interpretation to be done on the question, does it want an example that the compiler *necessarily* cannot catch, or just one that your compiler can't catch because it's not smart enough... – Steve Jessop Jan 24 '14 at 01:56
@SteveJessop; But, since these statements are grammatically (sementics) correct, they are not raised by compilers as errors (in general). – haccks Jan 24 '14 at 01:58
1

@SteveJessop It's also due to the definition of undefined behaviour: *"behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard **imposes no requirements**"*. The compiler is not required to raise an error and may even [make demons come out of your nose](http://www.catb.org/jargon/html/N/nasal-demons.html). – Kninnug Jan 24 '14 at 02:01
My compiler generates all sorts of warnings for things where the standard doesn't require diagnostics. So while it's true that there's no need to diagnose this error, it doesn't follow that the compiler won't. Judging by the number of dupe questions about sequence points, I think it would fairly clearly be helpful to C users if compilers did catch it. Albeit not as helpful as other things that compiler developers could be working on instead. – Steve Jessop Jan 24 '14 at 02:09

Steve Jessop · Answer 3 · 2014-02-06T09:35:39.293

Here's an example for C++. Suppose we have a function:

int incsum(int &a, int &b) {
    return ++a + ++b;
}

Then the following code has undefined behavior because it modifies an object twice with no intervening sequence point:

int i = 0;
incsum(i, i);

If the call to incsum is in a different TU from the definition of the function, then it's impossible to catch the error at compile time, because neither bit of code is inherently wrong on its own. It could be detected at link time by a sufficiently intelligent linker.

You can generate as many examples as you like of this kind, where code in one TU has behavior that's conditionally undefined for certain input values passed by another TU. I went for one that's slightly obscure, you could just as easily use an invalid pointer dereference or a signed integer arithmetic overflow.

You can argue how easy it is to generate code to catch this -- I wouldn't say it's very easy, but a compiler could notice that ++a + ++b is invalid if a and b alias the same object, and add the equivalent of assert (&a != &b); at that line. So detection code can be generated by local analysis.

@ElchononEdelson: thanks, fixed. I have no idea why I wrote "C" :-) — Steve Jessop, Feb 06 '14 at 09:42

Error that is neither syntactic nor semantic?

3 Answers3