1

I am working on a chat client and server. I currently have this line in my server for debugging purposes:

printf("Message for %s:\nTimestamp: %ld, Message: %s, Length: %d\n", args->name, *(int64_t*)(message->data), message->data+8, message->length);

args->name contains a char* to a normal null-terminated string and message is a struct string*:

struct string
{
    char* data;
    uint32_t length;
    uint32_t capacity;
};

In this case, the first 8 byte are a posix timestamp, the rest is just a null-terminated string.

If I compile with -m64 I get this output:

Message for some_user:
Timestamp: 1512060499, Message: >Server1@some_user:test, Length: 32

But compiling with -m32 yields this output:

Message for some_user:
Timestamp: 1512060650, Message: (null), Length: 69823144

Now the message is transferred to the client via a function containing this line:

write(socket_fd, message->data, message->length)

The really weird thing is, the message arrives at the client completely fine. I get exactly the same output on the client side.

Am I using the printf function wrong somehow?

user2736738
  • 30,591
  • 5
  • 42
  • 56
Kona98
  • 139
  • 2
  • 9
  • `"%ld"` is for `long`, not necessarily `int64_t`. Use `"%" PRId64` with `int64_t`. – chux - Reinstate Monica Nov 30 '17 at 17:08
  • Wow, that fixed it, but why would that destroy my other pointers? – Kona98 Nov 30 '17 at 17:16
  • Code lied to `printf()` about `long` vs. `int64_t` and so it was confused, especially about what was afterward. – chux - Reinstate Monica Nov 30 '17 at 17:23
  • Because in 32 bit mode, you pass an 8 byte value on the stack, but printf thinks it is a `long` (four bytes in ILP32) It then takes the upper 4 bytes of your time stamp as the pointer to the message (assuming little endian, this will be 0 probably). It then takes the pointer as the length. – JeremyP Nov 30 '17 at 17:24
  • Because you put 8 bytes on the stack where printf expected 4, all the remaining arguments are 4 bytes further down the stack than printf is expecting. – JeremyP Nov 30 '17 at 17:32
  • `*(int64_t*)(message->data)` is probably UB too (strict aliasing violation) – M.M Mar 03 '18 at 22:29
  • @M.M: not if all the other accesses to those memory locations are through `char *data`, because `char*` can alias anything. As I understand it, you'd only have a strict-aliasing problem if you had other places in the code that do something like `*(int32_t)*message->data = 1234;` – Peter Cordes Mar 03 '18 at 23:15
  • @PeterCordes `char` can alias anything, but `int64_t` cannot alias `char` . The rule is not symmetric . – M.M Mar 03 '18 at 23:20
  • @M.M: But can't you look at that memory location *as* an `int64_t` which often has some of its bytes written through a `char*`? Does that argument only work you got the memory from `void *malloc()` originally, rather than with `data` pointing to a `char array[]`? My understand may be flawed here. – Peter Cordes Mar 03 '18 at 23:24
  • @PeterCordes You can only look at it as the declared type (if it points to a named variable), or as the last type written (if it points to malloc'd space) – M.M Mar 04 '18 at 02:51
  • @M.M: I'm not sure I buy the "last type written" for malloc'ed space. Then if you use malloc'ed space for `int64_t`, then write a byte of one of those integers through a `char*`, it would be UB to read the data through an `int64_t*`? But I thought the "`char*` can alias anything" rule specifically made that safe. Anyway, agreed on the non-malloc'ed case for a declared type of `char` or `char[]`: accesses to the array type, rather than through a `char*` to it, aren't allowed to alias with `int64_t*` reads of the same data. – Peter Cordes Mar 04 '18 at 03:13
  • @PeterCordes Yes, your first example would be UB for that reason. You can read the strict aliasing rule in the C Standard if you don't believe it. If you write as `char` and then read as `int64_t`, you are aliasing a char as `int64_t` (which is not allowed). You are not aliasing an int64_t as char. In any case, comments are not the place to argue about this, there are plenty of other questions on this topic – M.M Mar 04 '18 at 03:17
  • @M.M: Maybe I'm interpreting the wording incorrectly, but N1570 (ISO C11) says in 6.5 6) `If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object ...`. 6.5 7) says `An object shall have its stored value accessed only by an lvalue expression`... of the right type, or a character type, and 3.1 says "access" means to read *or modify* the value of an object. Obviously you can read an int64_t written by memcpy, but you're arguing you have to copy the whole object – Peter Cordes Mar 04 '18 at 03:51

1 Answers1

2

Printing with the wrong format specifier is UB (2 places).

// printf("Message for %s:\nTimestamp: %ld, Message: %s, Length: %d\n", 
//    args->name, *(int64_t*)(message->data), message->data+8, message->length);

#include <inttypes.h>

printf("Message for %s:\nTimestamp: %" PRId64 ", Message: %s, Length: %" PRIu32 "\n", 
    args->name, *(int64_t*)(message->data), message->data+8, message->length);

Casting and dereferencing arbitrary aligned pointers to int64_t may create issues. Better to copy.

int64_t t64;
memcpy(&t64, message->data, sizeof t64);
printf("Message for %s:\nTimestamp: %" PRId64 ", Message: %s, Length: %" PRIu32 "\n", 
    args->name, t64, message->data+8, message->length);
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Explanation of exactly what goes wrong with variadic functions when the caller and callee don't agree on the types being passed / looked-for: [Why va_arg() produce different effects on x86_64 and arm?](https://stackoverflow.com/questions/49041919/why-va-arg-produce-different-effects-on-x86-64-and-arm/49042074#49042074). In this case, 32-bit `printf` was only consuming the first half of `timestamp`, and using the 2nd half as the pointer. (The high half on little-endian 32-bit x86, which is why it was `0`, the binary representation of a NULL pointer on that platform). – Peter Cordes Mar 03 '18 at 23:20