26

The LLVM language specifies integer types as iN, where N is the bit-width of the integer, and ranges from 1 to 2^23-1 (According to: http://llvm.org/docs/LangRef.html#integer-type)

I have 2 questions:

  1. When compiling a C program down to LLVM IR level, what types may be lowered to i1, i2, i3, etc? It seems like the types i8, i16, i32, i64 must be enough, so I was wondering what all the other nearly 8 million integer types are for.

  2. Is it true that both signed and unsigned integer types are lowered to i32? What is the reason for that, and why does it not apply to something like 32-bit float (which is represented as f32 in LLVM)?

Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
Ali J
  • 332
  • 1
  • 5
  • 8

1 Answers1

29

First of all, be aware both arbitrary-sized integers and no distinction between signed and unsigned integers are modifications added to LLVM 2.0. Earlier versions had only a few integer types, with a signed/unsigned distinction.

Now, to your questions:

  1. LLVM, though designed with C/C++ in mind, is not specific to these languages. Having more possible integer types gives you more flexibility. You don't have to use these types, of course - and I'm guessing that, as you've mentioned, any C/C++ frontend to LLVM (i.e. Clang) would probably only generate i1, i8, i16, i32 and i64.

    Edit: apparently I'm mistaken and Clang does use some other integer types as well, see Jens's comment below.

  2. Yes, LLVM does not make a distinction between signed and unsigned integer type, so both will be lowered to i32. The operations on the unsigned integer, though, will be translated according to the original type; e.g. a division between unsigned integers will be udiv while between signed will be sdiv. Because integers are represented as two's complement, though, many operations (e.g. add) don't care about signed/unsigned and so only have a single version.

    As for why no distinction was made in LLVM between signed and unsigned, read the details on this enhancement request - in short, having both signed and unsigned versions led to a large IR bloat and was detrimental to some optimizations, so it was dropped.

    Finally, you ask about why no f32 - the answer is that I don't know, maybe it was deemed to be less useful than arbitrarily-sized integers. However, notice that f32 is not really descriptive - if you want arbitrary floating-point types you need to at least specify the size of the base number and the size of the exponent, something like f23e8 instead of float and f52e11 instead of double. That's a bit cumbersome if you ask me, though I guess float and double could have been made synonymous with those.

Oak
  • 26,231
  • 8
  • 93
  • 152
  • Actually `f32` does exist for floating point numbers. The reason I asked it was that I thought if both signed and unsigned integers can be represented by i32, then maybe a 32-bit float can also be represented that way. I guess this was a detailed design decision as you mention. – Ali J Feb 06 '13 at 15:25
  • @AliJ I don't see `f32` listed on [the lang ref section on floating-point types](http://llvm.org/docs/LangRef.html#floating-point-types)... unless you meant that there's a common implementation associated with that name? – Oak Feb 06 '13 at 15:39
  • 4
    Thanks for this! As it happens, I just ran across two instructions that Clang generated: `store i576 %bla, i576* bitcast (%class.Foo* @_Global to i576*), align 8` and `%11 = load i384* %10, align 8`. I would assume that these are lowered to either a `memcpy` call or a smartish sequence of store/load instructions using SIMD types. – Jens Apr 29 '13 at 01:15