GCC treats int i=048; as an error, why couldn't GCC be more intelligent?

Question

GCC treats int i=048; as an error because of 048 should be octal number, but 8 can't appear in an octal number.

But why couldn't GCC be more intelligent and treat is as a decimal number?

Because "intelligent" parsers that try to outsmart the users are known to cause disasters. [Canonical example](http://www.catb.org/jargon/html/D/DWIM.html). — Matteo Italia, Nov 05 '12 at 02:06
Why should it take the extra effort to go against the standard just so that you can waste your time putting pointless 0s at the beginning of literals? — chris, Nov 05 '12 at 02:14
Fun fact: JavaScript does exactly what you ask, with [not so intuitive results](http://stackoverflow.com/a/2003942/214671). Along this line: [PHP's](http://stackoverflow.com/questions/80646/how-do-the-equality-double-equals-and-identity-triple-equals-comparis) [or JS's](http://stackoverflow.com/a/1998224/214671) equality operator. It shows you that when you try too much to make some sense from bogus input you get... let's say "bizarre" results. — Matteo Italia, Nov 05 '12 at 02:18

Aniket Inge · Answer 1 · 2012-11-05T02:14:26.290

7

Because it would go against the syntactic rules of the language 'C' if it had interpreted 048 as a decimal. Any number that starts with 0 should be interpreted as "octal".

And that's a good thing, because compilers strive hard to be standard-compliant.

Also assume you're writing a C parser for your own C compiler that can actually "understand" you meant 048, 049.. were all decimal numbers. Now how would you make that parser? Its possible but unbelievably complicated. And source of tonnes of bugs.

edited Nov 05 '12 at 02:14

answered Nov 05 '12 at 02:05

Aniket Inge

25,375
5
50
78

That's the surface answer, yes, but if I may try to read OP's mind, I think he'd follow up and ask "so why does the C language say that `048` is an error?". – Adam Rosenfield Nov 05 '12 at 02:08
@AdamRosenfield: That's a rabbit hole right there, my friend. – Dietrich Epp Nov 05 '12 at 02:09
1

OP answered that question actually. he said 8 is an invalid octal character.. and hence it won't be an octal. So why isn't GCC smart enough to think 048 was supposed to be meant as "48" and not an invalid octal number. – Aniket Inge Nov 05 '12 at 02:10

score 6 · Answer 2 · answered Nov 05 '12 at 02:13

6

It's not really a fault of GCC, since it strives to conform to the standard.

But imagine if it did make an exception of the form "If a numeric token begins with 0 but isn't Octal, then treat it as decimal". Not only is this relatively complicated rule -- "if the code doesn't make sense in the usual way, fall back to an alternative interpretion and see if it makes sense that way" -- but it presents all sorts of other unexpected behaviour:

/* My bearings */
int east      = 000;  // 0 in Octal = 0
int northeast = 045;  // 45 in Octal = 37
int north     = 090;  // Starts with 0 but contains 9, must be decimal = 90
int northwset = 135;  // Starts with 1, is decimal = 135
...

It's true that similar code with the existing behaviour could also pass through the compiler with unintended values for the variables. The point is that adding a special rule to help your case will leave other cases remaining. It is better to catch errors and treat them as such, than to figure out that some of them can be interpreted differently.

(FWIW, I've never used the Octal notation and find its presence scary, because in many other situations I'll pad decimal numbers with 0s for presentation. Remembering never to do that in C takes a little bit of extra brain power.)

answered Nov 05 '12 at 02:13

Edmund

10,533
3
39
57

2

I think that it was included almost only for UNIX file protection masks, never seen any other usage for octal literals (besides confusing people). – Matteo Italia Nov 05 '12 at 02:22
That sounds likely (in fact that's the only common Octal usage I can think of). There's no escaping that C and Unix grew up together. Unfortunately C has turned out to be such a useful langauge that sometimes we wish features like this could be consigned to history! – Edmund Nov 05 '12 at 03:37
1

As a modern use, octal is (or would be) great for UTF-8. Unlike hex, where the UTF-8 representation looks nothing like the unicode codepoint number, there's a trivial mapping between the unicode codepoint number (in octal) and the UTF-8 bytes (in octal). – R.. GitHub STOP HELPING ICE Nov 05 '12 at 04:16
@R..: I would like to see the Standard define an alternate notations for octal and decimal, and deprecate the notation which relies solely on the leading zero [the alternate notation for decimal would be useful in some macro contexts where it's necessary to concatenate things to form a decimal number, and avoiding leading zeroes would be awkward]. – supercat Jul 30 '15 at 17:45
@R..: I've seen some assemblers that accepted `0q31`, `0t025`, and `0b11001` as equivalents to 25, and I've worked with C compilers that accepted the latter. The Standards Committee, however, doesn't consider even the latter well-enough established to justify inclusion in the language even though many embedded systems compilers have supported it for decades. – supercat Jul 30 '15 at 17:48

score 5 · Answer 3 · answered Nov 05 '12 at 02:06

The C language requires that a diagnsotic (error) be issued in this case. GCC must comply.

Aside from that, the type of behavior you're advocating is extremely harmful. It would mask various bugs/typos, and introduce a very confusing inconsistency. For example, if some code contained:

int x = 01800;

and you changed the 18 to 20, the value of x would actually decrease!

Brian Cain · Accepted Answer · 2015-07-31T03:48:19.273

What would you expect the greater-intelligence-having gcc to do if you input

int i=047;

...and then, subsequently, what might you expect if one of your colleagues changed the program to read

int i=049;

..you and he or she would be floored to learn that the value had changed from 39 to 49 by only adding "2"!

The Principle of Least Astonishment helps guide designers of all kinds, and no doubt this may have been factor in the design of the C language.

That said, even less astonishing still would be octal literals which are not merely a leading zero away. To that end, languages like python and rust encode octal as "0o47" (similar to C et al's hexadecimal literals).

GCC treats int i=048; as an error, why couldn't GCC be more intelligent?

4 Answers4