37

I just answered this question, which asked why iterating until 10 billion in a for loop takes so much longer (the OP actually aborted it after 10 mins) than iterating until 1 billion:

for (i = 0; i < 10000000000; i++)

Now my and many others' obvious answer was that it was due to the iteration variable being 32-bit (which never reaches 10 billion) and the loop getting an infinite loop.

But though I realized this problem, I still wonder what was really going on inside the compiler?

Since the literal was not appended with an L, it should IMHO be of type int, too, and therefore 32-bit. So due to overflow it should be a normal int inside the range to be reachable. To actually recognize that it cannot be reached from int, the compiler needs to know that it is 10 billion and therefore see it as a more-than-32-bit constant.

Does such a literal get promoted to a fitting (or at least implementation-defined) range (at least 64-bit, in this case) automatically, even if not appended an L and is this standard behaviour? Or is something different going on behind the scenes, like UB due to overflow (is integer overflow actually UB)? Some quotes from the Standard may be nice, if any.

Although the original question was C, I also appreciate C++ answers, if any different.

Community
  • 1
  • 1
Christian Rau
  • 45,360
  • 10
  • 108
  • 185
  • 2
    Short version: Yes, the compiler automatically promotes literals, though I'm not sure this is an "everyone does it" or an "it's mandatory." Also, yes, signed integer overflow is actually UB. I leave it to someone else to dig up the standards quotes and heap the reward. – Chris Lutz Nov 13 '11 at 00:28
  • What type of variable is "i"? – Jim Rhodes Nov 13 '11 at 00:29
  • @Jim Due to OP's question and its solution it was a 32-bit variable and I myself guess it was an `int` or `unsigned int` (does signedness actually matter here?). – Christian Rau Nov 13 '11 at 00:31
  • 3
    Any time you use "IMHO" in reference to a language rule, it's a sign that you should check the standard first. – Keith Thompson Nov 13 '11 at 00:47
  • 1
    @KeithThompson Indeed I actually first should have delved into the standard instead of going the lazy way of an SO question. You might rightfully blame me for just being lazy. – Christian Rau Nov 13 '11 at 00:55
  • If the loop body is empty, I'd be surprised if an optimizing compiler didn't just take the whole thing out, regardless of the loop bound. – Karl Knechtel Nov 13 '11 at 02:34
  • @Karl Indeed he said it was his real tested example and obviously the compiler didn't optimize it away. – Christian Rau Nov 13 '11 at 15:58

3 Answers3

39

As far as C++ is concerned:

C++11, [lex.icon] ¶2

The type of an integer literal is the first of the corresponding list in Table 6 in which its value can be represented.

And Table 6, for literals without suffixes and decimal constants, gives:

int
long int
long long int

(interestingly, for hexadecimal or octal constants also unsigned types are allowed - but each one come after the corresponding signed one in the list)

So, it's clear that in that case the constant has been interpreted as a long int (or long long int if long int was too 32 bit).

Notice that "too big literals" should result in a compilation error:

A program is ill-formed if one of its translation units contains an integer literal that cannot be represented by any of the allowed types.

(ibidem, ¶3)

which is promptly seen in this sample, that reminds us that ideone.com uses 32 bit compilers.


I saw now that the question was about C... well, it's more or less the same:

C99, §6.4.4.1

The type of an integer constant is the first of the corresponding list in which its value can be represented.

list that is the same as in the C++ standard.


Addendum: both C99 and C++11 allow also the literals to be of "extended integer types" (i.e. other implementation-specific integer types) if everything else fails. (C++11, [lex.icon] ¶3; C99, §6.4.4.1 ¶5 after the table)

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 1
    Ah, so the `L` suffix I often have seen and used is not that strictly neccessary? Maybe only if you want to be perfectly explicit about the type. By the way, what do these citations (like lex.icon) mean, are they place-holders or has SO destroyed them? – Christian Rau Nov 13 '11 at 00:36
  • @Chris: It's the section in the C++ standard, they are just labeled like that. – Xeo Nov 13 '11 at 00:38
  • @ChristianRau: it seems to be not strictly necessary, but I would probably use them anyway, they don't hurt and make it clear that it's a `long` constant. The [lex.icon] thing is a reference inside the C++ standard, sections have both a section number and an identifier like that, and I prefer the latter since it's less easy to copy wrong. :) – Matteo Italia Nov 13 '11 at 00:38
  • 1
    Also for C, see http://stackoverflow.com/questions/5396054/question-about-c-datatype-and-constant/5396318#5396318 – ninjalj Nov 13 '11 at 00:41
  • Ok, thanks then. I always thought integer literals were always `int` by default, but I guess I confused it with being signed by default. – Christian Rau Nov 13 '11 at 00:41
  • 2
    @ChristianRau: I too thought like you before checking the standard to reply to your question. :) – Matteo Italia Nov 13 '11 at 00:43
  • 13
    The `L` suffix is useful when a value is small enough to fit in an `int`, but you want it to be `long` anyway: `42L` is of type `long`. Example: `60 * 60 * 24` is 86400, which can overflow if `int` is 16 bits; `60L * 60L * 24L` won't overflow. – Keith Thompson Nov 13 '11 at 00:46
12

From my draft of the C standard labeled ISO/IEC 9899:TC2 Committee Draft — May 6, 2005, the rules are remarkably similar to the C++ rules Matteo found:

5 The type of an integer constant is the first of the corresponding list in which its value can be represented.

Suffix      Decimal Constant          Octal or Hexadecimal Constant
-------------------------------------------------------------------
none        int                       int
            long int                  unsigned int
            long long int             long int
                                      unsigned long int
                                      long long int
                                      unsigned long long int

u or U      unsigned int              unsigned int
            unsigned long int         unsigned long int
            unsigned long long int    unsigned long long int

l or L      long int                  long int
            long long int             unsigned long int
                                      long long int
                                      unsigned long long int
Both u or U unsigned long int         unsigned long int
and l or L  unsigned long long int    unsigned long long int

ll or LL    long long int             long long int
                                      unsigned long long int

Both u or U unsigned long long int    unsigned long long int
and ll or LL 
sarnold
  • 102,305
  • 22
  • 181
  • 238
  • Matteo also mentioned the C standard behaving similarly, but thanks for this detailed list, anyway. I didn't know there is even an `LL` suffix (but it makes sense). – Christian Rau Nov 13 '11 at 00:47
2

I still wonder what was really going on inside the compiler

You can look at assembler, if you are interested in how the compiler interprets code.

10000000000:

400054f:
mov    -0x4(%rbp),%eax
mov    %eax,-0x8(%rbp)
addl   $0x1,-0x4(%rbp)
jmp    40054f <main+0xb>

so it just compiled it into infinite loop, if replace 10000000000 with 10000:

....
test   %al,%al
jne    400551
Bo Persson
  • 90,663
  • 31
  • 146
  • 203
fghj
  • 8,898
  • 4
  • 28
  • 56
  • I also guessed he made it an infinite loop, but I didn't know by which rights he did so. But thanks for the assembler, anyway. – Christian Rau Nov 13 '11 at 00:51
  • 4
    also if you use gcc and -Wextra it partly explain why. It should says something like comparision always true because of limited range of data type – fghj Nov 13 '11 at 00:53
  • 1
    @user1034749: that's why it's a good idea to run `gcc` always with `-Wall -Wextra`. – Matteo Italia Nov 13 '11 at 00:58