4

I made this program to test what data types arbitrary integer literals get evaluated to. This program was inspired from reading some other questions on StackOverflow.

How do I define a constant equal to -2147483648?

Why do we define INT_MIN as -INT_MAX - 1?

(-2147483648> 0) returns true in C++?

In these questions, we have an issue: the programmer wants to write INT_MIN as -2^31, but 2^31 is actually a literal and - is the unary negation operator. Since INT_MAX is usually 2^31 - 1 having a 32-bit int, the literal 2^31 cannot be represented as an int, and so it gets promoted to a larger data type. The second answer in the third question has a chart according to which the data type of the integer literals is determined. The compiler goes down the list from the top until it finds a data type which can fit the literal.

Suffix Decimal constants none int long int long long int

=========================================================================

In my little program, I define a macro that will return the "name" of a variable, literal, or expression, as a C-string. Basically, it returns the text that is passed inside of the macro, exactly as you see it in the code editor. I use this for printing the literal expression.

I want to determine the data type of the expression, what it evaluates to. I have to be a little clever about how I do this. How can we determine the data type of a variable or an expression in C? I've concluded that only two "bits" of information are necessary: the width of the data type in bytes, and the signedness of the data type.

I use the sizeof() operator to determine the width of the data type in bytes. I also use another macro to determine if the data type is signed or not. typeof() is a GNU compiler extension that returns the data type of a variable or expression. But I cannot read the data type. I typecast -1 to whatever that data type is. If it's a signed data type, it will still be -1, if it's an unsigned data type, it will become the UINT_MAX for that data type.

#include <stdio.h>   /* C standard input/output - for printf()     */
#include <stdlib.h>  /* C standard library      - for EXIT_SUCCESS */

/**
 * Returns the name of the variable or expression passed in as a string.
 */
#define NAME(x) #x

/**
 * Returns 1 if the passed in expression is a signed type.
 * -1 is cast to the type of the expression.
 * If it is signed, -1 < 0 == 1 (TRUE)
 * If it is unsigned, UMax < 0 == 0 (FALSE)
 */
#define IS_SIGNED_TYPE(x) ((typeof(x))-1 < 0)

int main(void)
{

    /* What data type is the literal -9223372036854775808? */

    printf("The literal is %s\n", NAME(-9223372036854775808));
    printf("The literal takes up %u bytes\n", sizeof(-9223372036854775808));
    if (IS_SIGNED_TYPE(-9223372036854775808))
        printf("The literal is of a signed type.\n");
    else
        printf("The literal is of an unsigned type.\n");

    return EXIT_SUCCESS;
}

As you can see, I'm testing -2^63 to see what data type it is. The problem is that in ISO C90, the "largest" data type for integer literals appears to be long long int, if we can believe the chart. As we all know, long long int has a numerical range -2^63 to 2^63 - 1 on a modern 64-bit system. However, the - above is the unary negation operator, not really part of the integer literal. I'm attempting to determine the data type of 2^63, which is too big for the long long int. I'm attempting to cause a bug in C's type system. That is intentional, and only for educational purposes.

I am compiling and running the program. I use -std=gnu99 instead of -std=c99 because I am using typeof(), a GNU compiler extension, not actually part of the ISO C99 standard. I get the following output:

$ gcc -m64 -std=gnu99 -pedantic experiment.c
$
$ ./a.out
The literal is -9223372036854775808
The literal takes up 16 bytes
The literal is of a signed type.

I see that the integer literal equivalent to 2^63 evaluates to a 16 byte signed integer type! As far as I know, there is no such data type in the C programming language. I also don't know of any Intel x86_64 processor that has a 16 byte register to store such an rvalue. Please correct me if I'm wrong. Explain what's going on here? Why is there no overflow? Also, is it possible to define a 16 byte data type in C? How would you do it?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Galaxy
  • 2,363
  • 2
  • 25
  • 59
  • Width and signedness aren't enough to identify a type, even an integer type. For example, `long` is almost always the same size as either `int` or `long long`. – user2357112 Jul 10 '18 at 21:49
  • Well, clearly neither `long` nor `long long` are 16 bytes! What do you suggest in order to identify an integer data type? – Galaxy Jul 10 '18 at 21:51
  • 2
    I believe some versions of gcc implement a 128-bit type, although it's typically "emulated in software", since you're right, most CPUs don't have registers and ALU's operating on such a size. – Steve Summit Jul 10 '18 at 21:52
  • If you want to make decisions based on the type of an expression, [`_Generic`](https://en.cppreference.com/w/c/language/generic) would be the standard way to go. – user2357112 Jul 10 '18 at 21:55
  • 1
    to print the result of `sizeof`, you should use `%zu` – Christian Gibbons Jul 10 '18 at 22:02
  • "Well, clearly neither long nor long long are 16 bytes!" this statement requires additional qualifiers. C++ only specifies the **minimum** size of the fundamental integer types and leaves the upper end restricted only by the size of the next type up (`int` cannot be larger than `long`). It is entirely possible to have a system where all integers `char` through `long long` are 128 bits wide. Or 4096 bits, for that matter. – user4581301 Jul 10 '18 at 22:54
  • Yes, that kind of data types arrangement would indeed conform to the standard! Actually, I meant that on my 64-bit computer both, `long` and `long long` are 8 bytes wide. – Galaxy Jul 10 '18 at 23:25
  • Also there are no negative literals. The expression `-3` is unary minus operator applied to the literal `3`. – M.M Jul 10 '18 at 23:36

3 Answers3

4

Your platform likely has __int128 and 9223372036854775808 is acquiring that type.

A simple way to get a C compiler to print a typename is with something like:

int main(void)
{

    #define LITERAL (-9223372036854775808)
    _Generic(LITERAL, struct {char x;}/*can't ever match*/: "");

}

On my x86_64 Linux, the above is generating an error: ‘_Generic’ selector of type ‘__int128’ is not compatible with any association error message, implying __int128 is indeed the type of the literal.

(With this, the warning: integer constant is so large that it is unsigned is wrong. Well, gcc isn't perfect.)

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
3

After some digging this is what I've found. I converted the code to C++, assuming that C and C++ behave similarly in this case. I want to create a template function to be able to accept any data type. I use __PRETTY_FUNCTION__ which is a GNU compiler extension which returns a C-string containing the "prototype" of the function, I mean the return type, the name, and the formal parameters that are input. I am interested in the formal parameters. Using this technique, I am able to determine the data type of the expression that gets passed in exactly, without guessing!

/**
 * This is a templated function.
 * It accepts a value "object" of any data type, which is labeled as "T".
 *
 * The __PRETTY_FUNCTION__ is a GNU compiler extension which is actually
 * a C-string that evaluates to the "pretty" name of a function,
 * means including the function's return type and the types of its
 * formal parameters.
 *
 * I'm using __PRETTY_FUNCTION__ to determine the data type of the passed
 * in expression to the function, during the runtime!
 */
template<typename T>
void foo(T value)
{
    std::cout << __PRETTY_FUNCTION__ << std::endl;
}

foo(5);
foo(-9223372036854775808);

Compiling and running, I get this output:

$ g++ -m64 -std=c++11 experiment2.cpp
$
$ ./a.out
void foo(T) [with T = int]
void foo(T) [with T = __int128]

I see that the passed in expression is of type __int128. Apparently, this is a GNU compiler specific extension, not part of the C standard.

Why isn't there int128_t?

https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/_005f_005fint128.html

https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/C-Extensions.html#C-Extensions

How is a 16 byte data type stored on a 64 bit machine

Galaxy
  • 2,363
  • 2
  • 25
  • 59
  • For C++, Scott Mayers has recommended printing types with a compiler error on an declared but undefined template `template class TP; TP tp;`. My answer's basically using a `_Generic`-based C version of that strategy. (Anyway, you might not want to answer C-tagged questions with C++ code. Everybody knows us C programmers often HATE C++ :D ). – Petr Skocik Jul 10 '18 at 22:22
  • @PSkocik No problem, added the `c++` tag! Anyway, I tend to use C++ features interchangeably with C features. If I want to use something that C++ has that C lacks, I just do `#ifdef __cplusplus` and put that straight into the `.c` source code file, or I use an `extern "C"` to link C and C++ code together. Well, just use the tools that are available in your hands. – Galaxy Jul 10 '18 at 22:27
  • For me it's just like using any GNU compiler extensions or inline assembly code in C. So yes, it's not pure C. – Galaxy Jul 10 '18 at 22:29
  • Whereas `C++` is a programming language, `++C` is `C` code with c++ code and various non-standard extensions mixed in! :D – Galaxy Jul 10 '18 at 22:29
3

With all warnings enabled -Wall gcc will issue warning: integer constant is so large that it is unsigned warning. Gcc assigns this integer constant the type __int128 and sizeof(__int128) = 16.
You can check that with _Generic macro:

#define typestring(v) _Generic((v), \
    long long: "long long", \
    unsigned long long: "unsigned long long", \
    __int128: "__int128" \
    )

int main()
{
    printf("Type is %s\n", typestring(-9223372036854775808));
    return 0;
}

Type is __int128

Or with warnings from printf:

int main() {
    printf("%s", -9223372036854775808);
    return 0;
}

will compile with warning:

warning: format '%s' expects argument of type 'char *', but argument 2 has type '__int128' [-Wformat=]
KamilCuk
  • 120,984
  • 8
  • 59
  • 111