Should implementations issue a diagnostic if the internal fixed / non-fixed translation limits are exceeded?

Question

Context: C11:

Implementations should avoid imposing fixed translation limits whenever possible.

Consider a case: the internal fixed / non-fixed translation limits are exceeded, leading to silent generation of the wrong code.

It seems reasonable to issue a diagnostic if the internal fixed / non-fixed translation limits are exceeded. Does anyone know if the implementations already do that?

I don't see how a programs hitting translation limit can generate any code, let alone "silently". — Eugene Sh., Aug 13 '21 at 15:43
"Silent" generation of a wrong code would be a major bug for any compiler. — Eugene Sh., Aug 13 '21 at 15:55
A general comment on the theme of this and some of your past questions: language standards are meant to be read *together with* common sense, not *instead of* it. — Nate Eldredge, Aug 13 '21 at 16:02
@EugeneSh.: Many, if not all, translation units are easily reportable: A compiler can report when it is unable to allocate more memory, when an identifier is longer than it supports in significant characters, and so on. — Eric Postpischil, Aug 13 '21 at 20:04
@EugeneSh.: Most compilers will silently generate “wrong” code for programs that use too much stack space. — Eric Postpischil, Aug 13 '21 at 20:06
@EricPostpischil It is hardly "wrong", it would correctly correspond to the abstract machine described by the C code. — Eugene Sh., Aug 13 '21 at 20:18
@EugeneSh.: No, it does not correspond. The abstract machine does not have any limit on function call depth and does not crash regardless of the depth. Actual machines do have a limit and do crash. — Eric Postpischil, Aug 13 '21 at 20:26

John Bollinger · Accepted Answer · 2021-08-13T16:08:22.493

Consider a case: the internal fixed / non-fixed translation limits are exceeded, leading to silent generation of the wrong code.

The language specification says very little about what may happen if translation limits are exceeded. In fact, it hasn't anything at all to say about translation limits beyond paragraph 5.2.4.1/1 and footnote 18, which you have already been reading.

Applying a language-lawyer reading to the specifications, we can observe that they neither explicitly specify that the behavior of a program that exceeds an implementation's translation limits is undefined, nor restrict their specifications of implementation and program behavior to programs that conform to all translation limits. It follows, then, that program behavior does not fail to be defined on account of the program exceeding translation limits. As a result, your hypothetical case does not arise from a combination of a conforming implementation and a conforming program.

What the specifications leave unsaid is that implementation and program behavior are contingent on the implementation accepting the program in the first place. Conforming implementations are not required to accept all conforming programs, nor even all strictly conforming programs. The avenue open to conforming implementations when faced with a program that exceeds its translation capabilities is to reject the program. If an implementation accepts and translates a given program, then implementation conformance requires that the program behave as described by the language specification.

It seems reasonable to issue a diagnostic if the internal fixed / non-fixed translation limits are exceeded. Does anyone know if the implementations already do that?

Implementations that reject a program at translation time, whether because translation limits are exceeded or for some other reason, generally do provide appropriate diagnostics. Of course, no one here can promise that every implementation provides such a diagnostic in every case.

Overall, I think I have already answered your main concern: except inasmuch as C implementations could have bugs in this area, you do not have to be concerned about undefined program behavior arising from exceeding translation limits.

It has long been accepted that implementations need not behave in any kind of constrained fashion if stack usage exceeds available stack space. If an implementation can process at least one (possibly contrived and useless) program that exercises the translation limits in 5.2.4,1 without bombing the stack, nothing would forbid it from bombing the stack when fed any other program, nor from capriciously behaving in a fashion consistent with its having done so. — supercat, Oct 20 '21 at 23:00
Re: _... generally do provide appropriate diagnostics_: indeed. One example: for `#line 2147483647` msvc generates `warning C4112: #line requires an integer between 1 and 16777215` followed by `note: Compiler limit for line number is 16777215`. BTW, per C11 the `2147483647` is supported: "The digit sequence shall not specify zero, nor a number greater than 2147483647". — pmor, Nov 02 '21 at 21:45

Steve Summit · Answer 2 · 2021-08-13T17:48:02.800

You're asking about a couple of different things here.

Undefined behavior exists because there are things which aren't legal in C but which for various reasons it is prohibitively difficult or even downright impossible for a compiler to warn you about. For example, if you write

int a[10];
char *p = a;
for(int i = 0; i < 20; i++)
    *p++ = i;

it is very hard for a conventional C implementation to detect (either at compile time or at run time) that you have done something very wrong. Therefore, it's your job not to do this: the compiler isn't obligated to generate code that works, nor is it obligated to give you an error message telling you that the program won't work.

Translation limits exist because no computer program, including a C compiler, can do everything, or access infinite amounts of memory. There will be C programs that a given compiler can't compile, not because the program contains an error, but simply because it is "too big" in some way.

Suppose your compiler has a data structure — an array — containing one element for each source line of your program. And suppose that the programmer of your C compiler was too lazy to make it a dynamically-allocated array. Suppose, for example, that the array is declared of size 1,000, meaning that you can't compile a C source file of more than 1,000 lines.

This would be a poor strategy, because it fails to honor the Standard's recommendation that implementations "avoid imposing fixed translation limits whenever possible". But that's not the question — the question for today is, with that compiler, what happens if you try compiling a 1,001-line source file?

If the compiler did the moral equivalent of that earlier code fragment I wrote, by doing something like

struct sourceline source[1000];
struct sourceline *p = source;
while(!feof(ifp))
    *p++ = parseline(ifp);

then, yes, if you tried to compile a 1,001-line source file, something undefined would happen. The compiler might corrupt its internal data structures and generate bad code for you. Or the compiler itself might crash.

But now we get to a third thing that the Standard talks about, but you didn't mention: quality of implementation issues. A compiler that not only had a fixed limit on the size of your source file, but that crashed or did something undefined if you exceeded it, would be an exceedingly poor quality of implementation. If a C program — including a C compiler — has a fixed-size array in it, then detecting and preventing possible overflow of that array is not "prohibitively difficult or even downright impossible". It is, rather, an ordinary, everyday, bread-and-butter task that every competent C programmer — and certainly a C programmer who's writing a C compiler! — must be capable of.

So, bottom line, this is a quality of implementation issue: I would posit that any decent-quality C compiler, that had a fixed-size translation limit, would treat a breach of that limit as an explicitly diagnosable error, not as silent undefined behavior.

[Footnote: Yes, while(!feof(ifp)) is always wrong. That was an example of bad code, so I didn't worry that it also had that other egregious error in it.]

Incidentally, if according to the C99 Rationale, one of the reasons the Standard is so vague about what is required of conforming implementations is that "The belief was that it is simply not practical to provide a specification which is strong enough to be useful, but which still allows for real-world problems *such as bugs*." In other words, if an implementation can correctly process a contrived and useless program that exercises the translation limits, "compiler bugs" that cause it to behave nonsensically with any other program should not render the implementation non-conforming. — supercat, Oct 05 '22 at 20:07

Should implementations issue a diagnostic if the internal fixed / non-fixed translation limits are exceeded?

2 Answers2

Linked