19

Are there features / semantics introduced, or removed, in C99 which would make a well defined program written in C89 either

  • invalid (i.e not compiling anymore, according to the C99 standard)
  • compiling, but having different semantics.

My findings so far, concerning plainly invalid programs:

  • implicit int (C89 §3.5.2)
  • implicit function declaration (C89 §3.3.2.2)
  • not returning from a function expecting a return value (C89 §3.6.6.4)
  • using new keywords as identifier (for example restrict, inline, etc)
  • hacks involving //, which are now treated as comments. However, nearly never encountered in production code.

Subtle changes, making the same code having different semantics:

  • Integer division has been made well defined, for example -3 / 2 now has to truncate towards zero (C99 §6.5.5/6), instead of being implementation defined (C89 §3.3.5/6)
  • strtod gained the ability to parse hexadecimal numbers in C99, by parsing 0x or 0X

What have I missed?

Leandros
  • 16,805
  • 9
  • 69
  • 108
  • 1
    isn't it too broad? What's the reason you ask this? – Sourav Ghosh Apr 18 '16 at 19:48
  • 1
    Yes, I agree it's a quite broad question. Thea reason? I'm interested, I still do all my work in C89, and no the standard quite good, and don't want to shoot me in the foot if I switch to a later standard. – Leandros Apr 18 '16 at 19:49
  • 1
    Well, while I may agree that your interest is genuine, this is not a very good on-topic question here. Do you have any specific case to discuss? – Sourav Ghosh Apr 18 '16 at 19:51
  • 1
    I would love to narrow it down to a specific case, I don't have one on hand, though. I couldn't find anything regarding the topic anywhere, that's why I got interested. – Leandros Apr 18 '16 at 19:53
  • C89 is not standard C, neither are C90 or C99. Standard is **only** C11. If you are a proficient in C, just read the standard. Links are available at the info-page. No offence, but yes, you have missed the past 17 years of C development. Note also C89 has not been an international standard. That was C90 (commonly named "ISO-C"). – too honest for this site Apr 18 '16 at 20:08
  • 3
    According to the title, the question is really about *breaking changes* in C99. It is definitely not too broad. – AnT stands with Russia Apr 18 '16 at 20:10
  • @Leandros: And others live in the past, ignoring new developments happily. Those are the reason why we are still discussing issues of a >27 year old standard. And one of the reasons C has such a bad reputation. – too honest for this site Apr 18 '16 at 20:15
  • 12
    @Olaf: Questions about language history are not off-topic here and do not in any way somehow imply that the person asking them "missed the past 17 years of C development". – AnT stands with Russia Apr 18 '16 at 20:15
  • @AnT: "I still do all my work in C89" in combination with the question very well looks like to me. – too honest for this site Apr 18 '16 at 20:17
  • 2
    @Olaf I don't live in the past, if you want to imply that. And I certainly haven't missed the last 27 years. Yes, C89 and C99 are two detracted standards, that doesn't change the fact that they're still very well used. And a similar question to mine was asked (but not answered) on the Linux Kernel development mailing list, since they're still working with `gnu89`. – Leandros Apr 18 '16 at 20:18
  • 3
    @Olaf Ever heard of this so-called "Linux"? If you do development on the kernel you have to stick to C89 (or C89 with GNU extensions). – Leandros Apr 18 '16 at 20:18
  • 2
    @Olaf: It doesn't to me. – AnT stands with Russia Apr 18 '16 at 20:19
  • @Leandros: Nobody uses Linux :-P TOS rulez! – too honest for this site Apr 18 '16 at 20:28
  • @AnT: So we disagree. Anyway, I did not state the question is OT, but too broad. Also it is badly reasearched. A simple read in the foreword of the standard would have been sufficient. Useless to cite here. But maybe OP is misslead, as that is about ISO, not ANSI. – too honest for this site Apr 18 '16 at 20:32
  • 1
    @Olaf Can you point me to what you mean? I'm currently skimming through the C89 and C99 standards, for disagreements. – Leandros Apr 18 '16 at 20:35
  • @Leandros: Again: C standard is ISO9899:2011. Neither C99 nor C90 is (and C89 never was in fact). What you seem is to learn a 17 years version of the standard which is already withdrawn since ca. 5 years. I'm not sure (and don't bother) if the C99 version includes such a list. The current version does. Just read from the beginning. – too honest for this site Apr 18 '16 at 20:37
  • 1
    Here is a link to the final draft (identical with the final version in allmost all relevant aspects. http://port70.net/~nsz/c/c11/n1570.html#Foreword – too honest for this site Apr 18 '16 at 20:40
  • 2
    @Olaf C89 was a C standard, in the United States of America. It was standardized by the American National Standards Institute (ANSI for short). In 1990 C90 was rationalized by ISO and since then C is "owned" by ISO as ISO 9899. Are we back to nitpicking? ;) – Leandros Apr 18 '16 at 20:41
  • 1
    @Leandros: There's good news: american standards are still not mandatory for the rest of the world. As much as DIN, btw. But my point was a different one: You might have been misslead, as the standard only lists changes to the former version. As C89 is not an ISO standard, you have to read the modification against C90 which is the first version with respect to ISO. – too honest for this site Apr 18 '16 at 20:46
  • @Leandros Please don't put words in my mouth! Id did not say something about anyonmes mental state. It is just that luckily no national standard has world-wide relevance. That is true for _every_ nationality. (I will not further follow this way). – too honest for this site Apr 18 '16 at 20:51
  • 1
    @Olaf I don't meant to, sorry if it sounded like this. Anyway, I agree and wish you a good evening. – Leandros Apr 18 '16 at 20:53
  • 7
    @Olaf: The 1990 ISO C standard describes exactly the same language described by the 1989 ANSI C standard, and ANSI officially adopted ISO C90 after it was published. ANSI also officially adopted the 1999 and 2011 ISO C standards shortly after they were published. As for C89/C90 and C99 being obsolete, that's strictly correct as far as ISO is concerned, but they're still relevant and it's perfectly appropriate to discuss them. You're free to ignore older editions of the standard, but there's no need to tell the rest of us we shouldn't mention them. – Keith Thompson Apr 18 '16 at 21:03
  • 2
    An answer in a comment: the C committee puts a lot of effort in not invalidating code from one version to another if it doesn't seem necessary. Probably there are not much more that you have found. You find a comprehensive list of the changes from C90 to C99 in the foreword. – Jens Gustedt Apr 18 '16 at 21:05
  • @KeithThompson: That was not the point. I suspected OP had missed the bullet list because C89 is not mentioned in the standard, but C90. I'm well aware they are almost identical. – too honest for this site Apr 18 '16 at 21:06
  • 3
    If you're asking about changes that would make valid C90 code invalid in C99 (or, worse, still valid but with different semantics), I suggest updating your question to make that clear. The phrase "major incompatibilities" is vague; "changes that broke existing code" is less so. I suggest reading the Foreword of [N1256](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), a draft of the C99 standard. You should also take a look at [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), a draft of the C11 standard. – Keith Thompson Apr 18 '16 at 21:07
  • 2
    Thank you very much @JensGustedt, and Keith Thompson. I'll update my question accordingly. – Leandros Apr 18 '16 at 21:08
  • @JensGustedt: They changed some behaviour from C90 to C99 (e.g. integer division/remainder). Call it wishful thinking, but imo, if they had cut of more old wires, things likely would have become easier now. And there would not be that many people sticking with C90 and expecting there are no problems. – too honest for this site Apr 18 '16 at 21:10
  • 1
    I have updated the question, and included my findings. – Leandros Apr 18 '16 at 21:12
  • 2
    Some refs http://cboard.cprogramming.com/c-programming/136047-difference-between-c89-c99.html, http://stackoverflow.com/questions/2270899/c89-vs-c99-gcc-compiler, http://forums.xkcd.com/viewtopic.php?t=12672 – chux - Reinstate Monica Apr 18 '16 at 21:28
  • 2
    "Preprocessor arithmetic done in intmax_t/uintmax_t", "integer constant type rules" , "integer promotion rules" look important. – chux - Reinstate Monica Apr 18 '16 at 21:29
  • "Have I missed anything?" - yes – M.M Apr 18 '16 at 23:25
  • 1
    There are funny tricks because of the `//` comments. You should read the Rationale for C99; it covers such changes (including the comments issue). – Jonathan Leffler Apr 19 '16 at 05:22
  • http://stackoverflow.com/questions/36704376/code-comments-now-generating-compiler-errors came up earlier today. It's not enough for a full answer, but should definitely find its way onto your list somewhere. – cf- Apr 19 '16 at 07:00
  • 1
    Digraphs are C99, not C89, which should give you some scope to break things probably using `%:`. – Flexo Apr 19 '16 at 08:09
  • 1
    @Flexo: I think digraphs are not a problem. They can only appear in contexts that would make them syntax errors in C89, unlike trigraphs which are much more pervasive and serious (but trigraphs were in C89 anyway, and were unchanged in C99 or C11). Digraphs were added in the Amendment 1 in 1994 — along with some new headers (`wchar.h`, `wctype.h`, `iso646.h` — see [List of standard header files in C and C++](https://stackoverflow.com/questions/2027991/list-of-standard-header-files-in-c-and-c) — and probably a few other changes I've forgotten about). – Jonathan Leffler Apr 19 '16 at 18:34

2 Answers2

4

There are a lot of programs which would have been considered valid under C89, prior to the publication of C99, which some people insist were never valid. C89 includes a rule that requires that an object of any type may only be accessed using a pointer of that type, a related type, or a character type. Prior to the publication of C99, this rule was generally interpreted as applying only to "named" objects (variables of static or automatic duration which are accessed directly by name), and only in situations where the object in question didn't have its address taken immediately before it was used as a different pointer type. Such interpretation was motivated by a number of factors:

  1. One of the stated goals of the Standard was to fit with what existing compilers and programs were doing, and while it would have been rare for existing programs to access discrete named variables using pointers of different types other than in cases where the variable's address was taken immediately before such use, many other usages of pointer type punning were quite common.

  2. The rationale for the Standard includes as its sole example a function which receives a pointer of one primitive type to write a global variable of another primitive type in such a way that a compiler would have no particular reason to expect aliasing. Being able to keep global variables in registers is clearly a useful optimization, and the stated purpose of the rule is to allow such optimizations in cases where a compiler would have no reason to expect aliasing to occur. Outlawing constructs like like (int*)&foo=23; does nothing to aid such optimizations, since the fact that code is taking foo's address and dereferencing it should make it abundantly clear to any compiler that isn't being deliberately obtuse that the code is going to modify foo.

  3. There are many kinds of code which require semantically the ability to use memory bits as various types, and nothing in the Standard indicate that the rules were intended to make programmers jump through hoops (e.g. by using memcpy) to achieve semantics that could have been easily obtained in the absence of the rules, especially considering that using memcpy would prevent the compiler from keeping global variables in registers across the pointer accesses (thus defeating the purpose for which the rules were written in the first place).

  4. If structure types V and W have a common initial sequence, U is any union type containing both, and p is a V* which identifies the V within a U, then (W*)(U*)p may be used to access those common members, and will be equivalent to (W*)p. Unless a compiler could show that p couldn't possibly be a pointer to a member of some union containing W, it would be required to allow (W*)p to access the common members; it was more helpful to simply treat such common member access as being legitimate regardless of whether or where U might exist than to search for excuses to deny it.

  5. Nothing in the C89 rules makes clear how the "type" of a region of allocated storage is defined, or how storage which holds things of one type that are no longer needed might be re-purposed to hold things of another.

  6. Keeping track of registers allocated to named variables was easier than keeping track of registers allocated to other pointer exceptions, and code which was interested in minimizing the number of loads and stores via pointers would often copy things to named variables and work on them there.

C99 added "effective type" rules which are explicitly applicable to allocated storage. Some people insist those were merely "clarifications" of rules which already existed in C89, but for the above reasons I find that viewpoint untenable. It's fashionable to claim that the only reasons compilers didn't apply aliasing rules to unnamed objects are #5 and #6, but objections #1-#4 are equally significant (and continue to apply to C99 just as much as C89). Still, since C99 added the effective type rules, many constructs which would have been treated as legitimate by most common interpretations of the C89 rules are clearly forbidden.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • A glimpse of *sanity* among the loud and often mistaken (or just plain wrong) voices seeing type-punning and strict-aliasing violations at every turn... – David C. Rankin Apr 20 '16 at 05:41
  • @DavidC.Rankin: I find it bizarre how people can be so oblivious to the notion that the Standard was intended to provide a minimum baseline for implementations on platforms that couldn't efficiently offer the same features and guarantees as more commonplace platforms, so as to allow such platforms to be used to run C programs which didn't need the features the platforms lacked. I've seen no indication whatsoever that it was intended to deprecate commonplace practices that would work on commonplace platforms, nor to suggest that platforms that could easily support such practices, shouldn't. – supercat Apr 20 '16 at 05:53
  • @DavidC.Rankin: Personally, I think that the proper way forward for C would be to have directives to select among at least three aliasing modes that would be better defined than any that presently exist: precise aliasing, where everything behaves as though all operations go through memory [slow, but compatible with any code that makes any assumptions about aliasing], 1990s-style [assumes directly-accessed named objects won't alias things accessed with foreign pointer types, but makes no assumptions about pointers aliasing each other], and strict [which would be mostly even stricter... – supercat Apr 20 '16 at 05:57
  • ...than C99 *but* would include intrinsics to notify the compiler of what things might alias]. The latter would allow much better semantics *and* much more effective optimization than would be possible under C99 rules. For example, given `void hey(float *fp, int *ip) {*fp =12.3f; *ip=6;}` it would allow the two stores to be re-ordered absent something saying *fp and *ip can be reordered--something which is not generally possible under C99 rules, since execution would be well-defined even if the pointers alias provided the next read is of type `int`. – supercat Apr 20 '16 at 06:01
  • @supercat Very interesting suggestion that different modes should be standardised, rather than the lowest common denominator. Or surely when they were adding `restrict`, they could've thrown `norestrict` or an overload of `volatile` in for free. More options == more winners, surely. C and C++ will never escape their low-level origins, nor should they imo, so it would be a lot better if they weren't so draconian about certain things like this... especially when people had come to assume a certain behaviour, based on implementations/interpretation, that suddenly becomes technically invalid. Gah! – underscore_d Apr 20 '16 at 15:43
  • @underscore_d: A big advantage of modes would be that they could be added without anyone ever having to admit that they were "wrong". There certainly exist some kinds of code which would benefit from being able to alias things more freely than would be allowable under a lax interpretation of the Standard, and there are certainly some kinds of optimization that would be useful in many cases but which compilers would not be allowed to perform under even a strict interpretation of the Standard. – supercat Apr 20 '16 at 16:03
  • @underscore_d: Since writing the above, BTW, I've done some exploring and found that most of the "modern" compilers on godbolt will break strictly-conforming programs. From what I can tell, their execution model can't recognize the concept of code which changes the effective type of an object without physically reading and writing the storage associated therewith, even though it's impossible for a compiler to generate efficient and Standard-conforming code without that ability. – supercat Sep 14 '17 at 16:59
-1

As an element of contrast and comparison, the git/git codebase remains strictly conform to C89 and does not use C99 initializers, or features from newer C standard.
This is detailed in Git 2.23 (Q3 2019) in Git Coding Guidelines.

This answer illustrates post-C89 feature that might be compatible with C89.

See commit cc0c429 (16 Jul 2019) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit fe9dc6b, 25 Jul 2019)

CodingGuidelines: spell out post-C89 rules

Even though we have been sticking to C89, there are a few handy features we borrow from more recent C language in our codebase after trying them in weather balloons and saw that nobody screamed.

Spell them out.

While at it, extend the existing variable declaration rule a bit to read better with the newly spelled out rule for the for loop.

The coding guidelines now include:

You should not use features from newer C standard, even if your compiler groks them.

There are a few exceptions to this guideline:

  • since early 2012 with e1327023ea (Git v1.7.9.2), we have been using an enum definition whose last element is followed by a comma.
    This, like an array initializer that ends with a trailing comma, can be used to reduce the patch noise when adding a new identifer at the end.

  • since mid 2017 with cbc0f81d (Git v2.15.0-rc0), we have been using designated initializers for struct (e.g. "struct t v = { .val = 'a' };")
    There are certain C99 features that might be nice to use in our code base, but we've hesitated to do so in order to avoid breaking compatibility with older compilers.
    But we don't actually know if people are even using pre-C99 compilers these days.
    If this patch can survive a few releases without complaint, then we can feel more confident that designated initializers are widely supported by our user base.
    It also is an indication that other C99 features may be supported, but not a guarantee (e.g., gcc had designated initializers before C99 existed).

  • since mid 2017 with 512f41cf (Git v2.15.0-rc0), we have been using designated initializers for array (e.g. "int array[10] = { [5] = 2 }").
    This is another test balloon to see if we get complaints from people whose compilers do not support designated initializer for arrays.
    These used to be forbidden, but we have not heard any breakage report, and they are assumed to be safe.

  • Variables have to be declared at the beginning of the block, before the first statement (i.e. -Wdeclaration-after-statement).

  • Declaring a variable in the for loop "for (int i = 0; i < 10; i++)" is still not allowed in this codebase.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250