28

In a recent question, someone mentioned that when printing a pointer value with printf, the caller must cast the pointer to void *, like so:

int *my_ptr = ....

printf("My pointer is: %p", (void *)my_ptr);

For the life of me I can't figure out why. I found this question, which is almost the same. The answer to question is correct - it explains that ints and pointers are not necessarily the same length.

This is, of course, true, but when I already have a pointer, like in the case above, why should I cast from int * to void *? When is an int * different from a void *? In fact, when does (void *)my_ptr generate any machine code that's different from simply my_ptr?

UPDATE: Multiple knowledgeable responders quoted the standard, saying passing the wrong type may result in undefined behavior. How? I expect printf("%p", (int *)ptr) and printf("%p", (void *)ptr) to generate the exact same stack-frame. When will the two calls generate different stack frames?

zmbq
  • 38,013
  • 14
  • 101
  • 171
  • 1
    You are about to get a whole bunch of "undefined behavior" answers, where in essence, there is no difference whether you use an `int*` or a `void*` (as you said, no machine code is generated for this sort of casting). – barak manos Jul 21 '14 at 14:44
  • 1
    Actually, I got a couple of good answers explaining the theoretic difference. – zmbq Jul 21 '14 at 14:45
  • Let me know if you find a platform (compiler, linker, IDE, CPU, OS, whatever) under which these two options yield two **different** results... In fact, it would be interesting if you could add this requirement as part of your question... – barak manos Jul 21 '14 at 14:48
  • I don't think I can find one, but it doesn't really matter. I wanted to understand the logic behind this requirement. Now I do. – zmbq Jul 21 '14 at 14:50
  • 2
    Regarding "real architectures" on which the pointer representations for different types may be different see this top answer: http://stackoverflow.com/questions/8007825/list-of-platforms-supported-by-the-c-standard/8115542#8115542 . A char/void pointer has a different representation than a word pointer. – bockmabe Aug 18 '16 at 23:36
  • 1
    @barakmanos "there is no difference whether you use an `int*` or a `void*`" is a dangerously low standard to code to. It really means "In the very small number of times I tried that, I didn't notice a failure." When you write code that results in undefined behavior **you never know when it will fail in the future**. Upgrade your C library? ***BOOM!*** Recompile with a new version of your compiler? ***BOOM*** Port your code to another architecture? ***BOOM*** It's hard enough to write bug-free C code in perfect conditions. There is no acceptable reason for deliberate UB in your code. Ever. – Andrew Henle Feb 21 '21 at 01:55

7 Answers7

32

The %p conversion specifier requires an argument of type void *. If you don't pass an argument of type void *, the function call invokes undefined behavior.

From the C Standard (C11, 7.21.6.1p8 Formatted input/output functions):

"p - The argument shall be a pointer to void."

Pointer types in C are not required to have the same size or the same representation.

An example of an implementation with different pointer types representation is Cray PVP where the representation of pointer types is 64-bit for void * and char * but 32-bit for the other pointer types.

See "Cray C/C++ Reference Manual", Table 3. in "9.1.2.2" http://docs.cray.com/books/004-2179-003/004-2179-003-manual.pdf

ouah
  • 142,963
  • 15
  • 272
  • 331
  • 3
    Nitpick: You can't "invoke" undefined behaviour; it's something a program has, or rather doesn't have. The behaviour of the program, is not defined (i.e. there is no well-defined behaviour) – Lightness Races in Orbit Jan 01 '20 at 17:34
  • 3
    @LightnessRacesinOrbit Hush, we are all mighty wizards with the rare power of invoking unpredictable chaos and absolute terror through summoning upon the invincible weapon of *undefined behavior* , famous for making programmers everywhere tremble ! – A P Jo Oct 08 '20 at 16:12
  • 2
    @LightnessRacesinOrbit It can depends on the flow of execution, for example `if ( getchar() == 'x' ) 1/0;` only has undefined behaviour if the input has `x` at that moment. It seems fine to me to use "invokes UB" to mean "the behaviour of the program is undefined if execution reaches this statement". – M.M Feb 21 '21 at 01:22
  • @LightnessRacesinOrbit "invokes undefined behavior" is an idiomatic expression in C. It has a long legacy in C, from the usage in comp.lang.c by senior members to C Committee members usage. – ouah Feb 23 '21 at 11:17
  • @ouah: The meaning nowadays, however, has diverged radically from that described in the published Rationale for the C Programming Standard, since the latter included actions that were "non-portable" but correct, including "popular extensions" which the extremely vast majority of implementations were expected to process identically. – supercat Apr 01 '21 at 19:49
  • Note GCC will not warn you about it except when you compile with `-pedantic` you will see `warning: format ‘%p’ expects argument of type ‘void *’, but argument 2 has type ‘int *’` – Rain Feb 08 '22 at 06:46
20

In C language all pointer types potentially differ in their representations. So, yes, int * is different from void *. A real-life platform that would illustrate this difference might be difficult (or impossible) to find, but at the conceptual level the difference is still there.

In other words, in general case different pointer types have different representations. int * is different from void * and different from double *. The fact that your platform uses the same representation for void * and int * is nothing more than a coincidence, as far as C language is concerned.

The language states that some pointer types are required to have identical representations, which includes void * vs. char *, pointers to different struct types or, say, int * and const int *. But these are just exceptions from the general rule.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Of course, all pointers having the same representation is not a coincidence, it's on purpose (remember far and huge pointers? Yuck), but still - good answer. I'll accept it once SO allows me. – zmbq Jul 21 '14 at 14:44
  • If the real life platform is impossible to find then how can it be coincidence , instead it means on every platform they are same , can you clear by doubt? – Suraj Jain Feb 08 '17 at 19:18
  • 1
    @Suraj Jain: Firstly, "impossible to find today" does not necessarily mean "impossible to find tomorrow". If the language authors continue to keep that distinction, there must be a reason for it. – AnT stands with Russia Jun 24 '17 at 07:33
  • Secondly, there are real life platforms where the *valid* representations are not really the same. The underlying storage might be is the same in size and kind but representations can easily be different. For example, platforms that strictly enforce data alignment (like Sun Sparc) would require lowest bits of `double *` pointer to be set to zeros, while `char *` pointer has no such requirement. Even though physically the same storage is allocated for these pointers, attempting to use a `char *` pointer as a `double *` pointer can easily lead to a trap. – AnT stands with Russia Jun 24 '17 at 07:35
10

Other people have adequately addressed the case of passing an int * to a prototyped function with a fixed number of arguments that expects a different pointer type.

printf is not such a function. It is a variadic function, so the default argument promotions are used for its anonymous arguments (i.e. everything after the format string) and if the promoted type of each argument does not exactly match the type expected by the format effector, the behavior is undefined. In particular, even if int * and void * have identical representation,

int a;
printf("%p\n", &a);

has undefined behavior.

This is because the layout of the call frame may depend on the exact concrete type of each argument. ABIs that specify different argument areas for pointer and non-pointer types have occurred in real life (e.g. the Motorola 68000 would like you to keep pointers in the address registers and non-pointers in the data registers to the maximum extent possible). I'm not aware of any real-world ABI that segregates distinct pointer types, but it's allowed and it would not surprise me to hear of one.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 1
    Thought I'd add a ref for the often used acronym [ABI](http://en.wikipedia.org/wiki/Application_binary_interface) – chux - Reinstate Monica Mar 27 '15 at 21:03
  • 2
    This explanation has always left me unsatisfied - why couldn't the standard just mandate an implicit promotion to `void *` for any pointer type when calling a variadic function, exactly as it does for `float` -> `double` and for integral types? On virtually any platform it would be a no-op, and I expect it to be pretty cheap even on the Crays quoted above (almost surely cheaper than a `float` to `double` conversion). – Matteo Italia Aug 08 '17 at 13:57
  • @MatteoItalia Yeah, I don't know why they didn't do that, but it's not there even in C11 (as far as I see on a quick skim). – zwol Aug 08 '17 at 14:55
  • 2
    @MatteoItalia The standardization of C in the late 1980s was about codifying ***existing*** implementations. Given the existing memory models [that had variable pointer sizes](https://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer_sizes) there likely was no common basis to standardize implicit pointer promotion in variadic functions in ways that would not break existing code bases. Merely standardizing the `void` type itself was probably about as far as the committee could go - K&R C has no `void` type. – Andrew Henle Feb 21 '21 at 01:49
  • @AndrewHenle: On the flip side, the authors of the Standard explicitly recognized in the published Rationale that implementations can and in many cases should, on a quality-of-implementation basis, extend the semantics of the language in ways appropriate to the target platform and intended usage by specifying how they will process actions where the Standard imposes no requirements. If a construct behaved usefully on 95% of implementations, but malfunction unpredictably on 5%, code using the construct would be "non-portable", but that does not mean "erroneous". – supercat Apr 01 '21 at 19:53
5

c11: 7.21.6 Formatted input/output functions (p8):

p The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
5

when printing a pointer value with printf, the caller must cast the pointer to void *

Even casting to void * is not sufficient for all pointers.

C has 2 kind of pointer: Pointers to objects and pointers to functions.

Any object pointer can convert to void* with no problem:

printf("My pointer is: %p", (void *)my_ptr); // OK when my_ptr points to an object

A conversion of a pointer to a function to void * is not fully defined.

Consider a system in 2021 where void * is 64-bit and a function pointer is 128-bit.


C does specify (my emphasis)

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type. C17dr § 6.3.2.3 6

To print a function pointer could attempt:

printf("My function pointer is: %jX", (uintmax_t) my_function_ptr); // Selectively OK

C lacks a truly universal pointer and lacks a clean way to print function pointers.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
3

In reality except on ancient mainframes/minis, different pointer types are extremely unlikely to have different sizes. However they have different types, and per the specification for printf, calling it with the wrong type argument for the format specifier results in undefined behavior. This means don't do it.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    Ancient mainframes. That might explain it. – zmbq Jul 21 '14 at 14:43
  • 3
    Even in modern microcontrollers, function pointers and data pointers are still likely to have different sizes due to the separate data and instruction address in Harvard architecture – phuclv Jul 21 '14 at 17:29
  • 1
    @LưuVĩnhPhúc: Function pointers are a completely different issue that OP's. They don't even necessarily *have* a conversion to/from data pointers. OTOH all data pointers have conversions to/from `void *` and are usually represented the exact same way. – R.. GitHub STOP HELPING ICE Jul 21 '14 at 20:02
  • but don't you need a cast to `void *` when printing a function pointer? – phuclv Jul 22 '14 at 16:32
  • 1
    @LưuVĩnhPhúc: There is no portable way to print a function pointer, though the implementation-defined cast to `void *` usually works on systems you might care about. – R.. GitHub STOP HELPING ICE Jul 22 '14 at 16:50
1

Addressing the question:

When will the two calls generate different stack frames?

The compiler may notice that the behaviour is undefined, and issue an exception, illegal instruction, etc. There's no requirement for the compiler to attempt to generate a stack frame, function call or whatever.

See here for an example of the compiler doing this in another case of UB . Instead of generating a deference instruction with null argument, it generates ud2 illegal instruction.

When the behaviour is undefined according to the language standard, there are no requirements on the compiler's behaviour.

M.M
  • 138,810
  • 21
  • 208
  • 365