0

given following piece of code, with a wrong printf statement for argument 'a':

#include <stdio.h>
void call(unsigned long long a, int b)
{
    printf("%lu,%d\n",a,b);
    printf("%llu,%d\n",a,b);
}
void main()
{
    call(0,1);
}

When you compile this normally, you get:

$ gcc m32.c
m32.c: In function ‘call’:
m32.c:4:12: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘long long unsigned int’ [-Wformat=]
     printf("%lu,%d\n",a,b);
            ^
$ ./a.out
0,1
0,1

but when you compile this with -m32, you get following output:

$ gcc -m32 m32.c
m32.c: In function ‘call’:
m32.c:4:12: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘long long unsigned int’ [-Wformat=]
     printf("%lu,%d\n",a,b);
            ^
$ ./a.out
0,0
0,1

Obviously, the first printf is wrong, but as you can see, after the printf, it is the second argument in the printing that is wrong, while I would not expect that too happen. And I cannot explain that. How is this possible?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
Jan Heylen
  • 129
  • 8
  • Adding a `-m32` flag to gcc doesn't changes the type of datatype – Gaurav Pathak Apr 10 '17 at 14:36
  • 2
    You already know you pass a wrong parameter type. You get a compiler warning. So why do you wonder the program invokes undefined behaviour? – too honest for this site Apr 10 '17 at 14:57
  • 1
    @GauravPathak: No, the data types are not changed! The `printf` format specifier is wrong for th types passed for **any** target. UB is UB. And OP apparently is aware of that. – too honest for this site Apr 10 '17 at 14:58
  • Equally valid output would be `Get your stuff together, Jan. We've been working together for who knows how long and you're still writing UB? I thought you were better than this!`. It's UB. It doesn't have to make sense, because by definition, it won't. – Nic Apr 11 '17 at 01:17

3 Answers3

4

Good answer @Attie! Also, because you triggered my low level passion :) I will try to give another perspective on the answer.

As you may know, in the x86 architecture the function arguments are sent through the stack and in the x64 architecture the function arguments are sent through registers (RDI, RSI, RDX, RCX, R8, R9, in this order).

So, what is important related to your issue, is that when you compile for 32 bits the arguments for the printf calls are sent through the stack. This is how your stack looks like before each of the two printf calls:

stack before printf call

Each rectangle block from the stack is represented by a 32 bit number (because you are in the x86 architecture). You want to send a 64 bit number as the first argument for printf! To achieve that, the compiler splits the unsigned long long number into two 32 bit parts and pushes them separately on the stack. This is why you get two zeroes on the stack along with the one value from the integer.

Now let's analyze the first call of printf.

0,0

As it has the "%lu,%d\n" format specifier it has to take one unsigned long and one int from the stack. %lu is 32 bits in the x86 architecture, so printf takes only one block from the stack. After this, one more block is taken for the integer (as we only "consumed" one of the two zeroes with the %lu, we will get the other zero for the %d).

The second call of printf outputs the normal values.

0,1

This call is done with the "%llu,%d\n" format specifier. %llu is 64 bits in the x86 architecture, so printf takes TWO blocks from the stack, thus printing a zero. After this, it takes one more block from the stack for the integer (which is the block with the one value).

You have to be really careful with the string format specifier you send to the printf function! The format string attack is a well-known type of attack that is based on issues like the one you showcased in your question.

Valy
  • 573
  • 2
  • 12
  • Thx alot, this probably explains what we see in another more complicated issue, so I'll add the accept to your answer as it's the most detailed. – Jan Heylen Apr 10 '17 at 18:01
2

You better stop at here

 printf("%lu,%d\n",a,b);

supplying an argument which does not match the expected type for the conversion specifier causes undefined behavior. After that, whatever happens, no one's responsible.

Quoting C11, chapter §7.21.6.1

[...] If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • it is indeed quite 'undefined', and you're probably right, shouldn't try to explain it, but the goal is to explain another issue which is more complicated then this little example. – Jan Heylen Apr 10 '17 at 14:36
2

It's important to understand what you're saying.

printf("%lu,%d\n",a,b);

Is interpreted by printf() as:

  • There is a long unsigned
  • There is an int

In this case, you are lying - that isn't true. What makes this especially bad in your case is that your system changes the size of the unsigned long between 34- and 64-bit (just like mine).

#include <stdio.h>

int main(void) {
        printf("sizeof(unsigned long): %zd\n", sizeof(unsigned long));
        printf("sizeof(unsigned long long): %zd\n", sizeof(unsigned long long));
        return 0;
}
$ gcc ll.c -o ll -m32 && ./ll
sizeof(unsigned long): 4
sizeof(unsigned long long): 8
$ gcc ll.c -o ll && ./ll
sizeof(unsigned long): 8
sizeof(unsigned long long): 8

So, yes, it'll work for 64-bit (by mistake), but with 32-bit printf() is taking the wrong sized values off the stack.

It's very important to make sure that the format string matches the arguments.


We can test this out (ignoring those helpful warnings...):

#include <stdio.h>

int main(void) {
    unsigned long long x;
    int y;

    x = 0x8A7A6A5A4A3A2A1ALLU;
    y = 0x4B3B2B1B;

    printf("%lx - %x\n", x, y);
    printf("%llx - %x\n", x, y);

    return 0;
}
$ gcc ll.c -o ll -m32 && ./ll
ll.c: In function ‘main’:
ll.c:10:2: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘long long unsigned int’ [-Wformat=]
  printf("%lx - %x\n", x, y);
  ^
4a3a2a1a - 8a7a6a5a
8a7a6a5a4a3a2a1a - 4b3b2b1b
$ gcc ll.c -o ll && ./ll
ll.c: In function ‘main’:
ll.c:10:2: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘long long unsigned int’ [-Wformat=]
  printf("%lx - %x\n", x, y);
  ^
8a7a6a5a4a3a2a1a - 4b3b2b1b
8a7a6a5a4a3a2a1a - 4b3b2b1b

You can see that in the 32-bit run, the value of x is split over the break! printf() has taken the 'first' 32-bits, and then the 'next' 32-bits, when actually we provided it with a single 64-bit value - endianness comes in to make this a little more confusing.


If you want to be prescriptive about your variable sizes, then take a look at my answer here: https://stackoverflow.com/a/43186983/1347519

Community
  • 1
  • 1
Attie
  • 6,690
  • 2
  • 24
  • 34