26

In the GLib documentation, there is a chapter on type conversion macros. In the discussion on converting an int to a void* pointer it says (emphasis mine):

Naively, you might try this, but it's incorrect:

gpointer p;
int i;
p = (void*) 42;
i = (int) p;

Again, that example was not correct, don't copy it. The problem is that on some systems you need to do this:

gpointer p;
int i;
p = (void*) (long) 42;
i = (int) (long) p;

(source: GLib Reference Manual for GLib 2.39.92, chapter Type Conversion Macros ).

Why is that cast to long necessary?

Should any required widening of the int not happen automatically as part of the cast to a pointer?

Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
sleske
  • 81,358
  • 34
  • 189
  • 227
  • 3
    I think because an int can be 16bit while a long is at least 32bit you might get 16 undefined bits if you cast it from int directly. But then on a 64bit machine, long might still be 32bit while a pointer could have size 64bit, getting the same issue (if it exists at all). – invalid_id Aug 19 '14 at 11:13
  • 4
    Casting integer types to pointers is *implementation-defined*, which means that a conforming compiler must document exactly what happens here. It would be nice if the author of this quote specified *which* systems required the `long` cast (and even nicer if they eschewed this technique entirely, since there are more reliable alternatives) – M.M Aug 19 '14 at 11:36
  • 1
    Yes that'd be one (or `intptr_t`) – M.M Aug 19 '14 at 11:58
  • 3
    @KerrekSB If you are going to convert back the pointer to the same type, as opposed to a wider type, you don't care how it is “extended” as long as the conversion from pointer to a narrow integer type keeps only the least significant bits (which is the usual behavior). Since there is not even an allusion to the kind of compiler that would define these conversions so as to cause trouble, I have to assume superstitious nonsense on the part of the glib authors. – Pascal Cuoq Aug 19 '14 at 12:10
  • 2
    https://developer.gnome.org/glib/stable/glib-Basic-Types.html I rest my case. This is the work of a person or persons who do not understand what they are trying to offer a compatibility layer for. `#define G_MINFLOAT FLT_MIN` for “the minimum positive value which can be held in a gfloat” is plain wrong. More importantly, not a single definition is useful if you have a C99 compiler, and only a few to provide compatibility with C90. – Pascal Cuoq Aug 19 '14 at 12:19
  • I edited `*void` to `void*` because I think that was a typo. If you had a reason for that, you can roll it back and I would like to know the reason. – Aykhan Hagverdili May 20 '20 at 11:48
  • 1
    @Ayxan: Thanks for posting a bounty, I'm still curious to get an answer :-). And yes, the "*void" was a typo. – sleske May 20 '20 at 18:22
  • Would you accept "The glib documentation is wrong, at least for their chosen example (and also in general). The example is also very poorly chosen." as an answer? 'Cause it *is* the answer, but people might not appreciate that. – EOF May 23 '20 at 17:46
  • @EOF: Yes, as long as the answer is adequately explained. – sleske May 23 '20 at 17:50

5 Answers5

13

The glib documentation is wrong, both for their (freely chosen) example, and in general.

gpointer p;
int i;
p = (void*) 42;
i = (int) p;

and

gpointer p;
int i;
p = (void*) (long) 42;
i = (int) (long) p;

will both lead to identical values of i and p on all conforming c implementations.
The example is poorly chosen, because 42 is guaranteed to be representable by int and long (C11 draft standard n157: 5.2.4.2.1 Sizes of integer types ).

A more illustrative (and testable) example would be

int f(int x)
{
  void *p = (void*) x;
  int r = (int)p;
  return r;
}

This will round-trip the int-value iff void* can represent every value that int can, which practically means sizeof(int) <= sizeof(void*) (theoretically: padding bits, yadda, yadda, doesn't actually matter). For other integer types, same problem, same actual rule (sizeof(integer_type) <= sizeof(void*)).

Conversely, the real problem, properly illustrated:

void *p(void *x)
{
  char c = (char)x;
  void *r = (void*)c;
  return r;
}

Wow, that can't possibly work, right? (actually, it might). In order to round-trip a pointer (which software has done unnecessarily for a long time), you also have to ensure that the integer type you round-trip through can unambiguously represent every possible value of the pointer type.

Historically, much software was written by monkeys that assumed that pointers could round-trip through int, possibly because of K&R c's implicit int-"feature" and lots of people forgetting to #include <stdlib.h> and then casting the result of malloc() to a pointer type, thus accidentally roundtripping through int. On the machines the code was developed for sizeof(int) == sizeof(void*), so this worked. When the switch to 64-bit machines, with 64-bit addresses (pointers) happened, a lot of software expected two mutually exclusive things:

1) int is a 32-bit 2's complement integer (typically also expecting signed overflow to wrap around)
2) sizeof(int) == sizeof(void*)

Some systems (cough Windows cough) also assumed sizeof(long) == sizeof(int), most others had 64-bit long.

Consequently, on most systems, changing the round-tripping intermediate integer type to long fixed the (unnecessarily broken) code:

void *p(void *x)
{
  long l = (long)x;
  void *r = (void*)l;
  return r;
}

except of course, on Windows. On the plus side, for most non-Windows (and non 16-bit) systems sizeof(long) == sizeof(void*) is true, so the round-trip works both ways.

So:

  • the example is wrong
  • the type chosen to guarantee round-trip doesn't guarantee round-trip

Of course, the c standard has a (naturally standard-conforming) solution in intptr_t/uintptr_t (C11 draft standard n1570: 7.20.1.4 Integer types capable of holding object pointers), which are specified to guarantee the
pointer -> integer type -> pointer
round-trip (though not the reverse).

EOF
  • 6,273
  • 2
  • 26
  • 50
  • 1
    Most of this answer is focused on pointers round-tripping through an `int`, but the question is about an `int` round-tripping through a pointer. I've seem code elsewhere (on Windows) doing that "int -> long -> pointer" instead of "int -> pointer". That's the main question. Why would anyone do that? – Aykhan Hagverdili May 23 '20 at 22:19
  • @Ayxan If the *goal* is to round-trip an `int` through a `void*`, then interposing a `long` is **absolutely and completely useless**. Perhaps then the main lesson from this answer is this: programmers are lazy, stupid monkeys that will chose to perform a raindance over learning the rules of the language they are using whenever they think they can get away with it. The basic rule of programming seems to be "if it *is* broken, don't fix it either". – EOF May 23 '20 at 22:23
  • frustratingly, that apparently is the answer, even though I've seen it been done a couple of times... – Aykhan Hagverdili May 23 '20 at 22:40
  • 1
    "will both lead to identical values of i and p on all conforming c implementations." - this is not true; `(void *)42` might immediately cause a trap on some implementation, and there's no requirement that different implementations give the same result for those that don't trap. Also there is no requirement that, for the same implementation, `(void *)42 == (void *)(long)42` – M.M May 26 '20 at 10:32
  • @M.M I don't say that the code will work the same on all conforming implementations, I say that there will be no difference between the code *with* the cast to `long` and the code *without* the cast to `long`. The only reason I can see for a difference would be UB (like the immediate trap you mention), in which case *both* are UB, and any difference is purely incidental and cannot be relied on. – EOF May 26 '20 at 11:32
  • 2
    The result of the cast is implementation-defined, the implementation could define that `(int)42` has a different result to `(long)42` – M.M May 26 '20 at 12:17
9

As according to the C99: 6.3.2.3 quote:

5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.56)

6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

According to the documentation at the link you mentioned:

Pointers are always at least 32 bits in size (on all platforms GLib intends to support). Thus you can store at least 32-bit integer values in a pointer value.

And further more long is guaranteed to be atleast 32-bits.

So,the code

gpointer p;
int i;
p = (void*) (long) 42;
i = (int) (long) p;

is safer,more portable and well defined for upto 32-bit integers only, as advertised by GLib.

Community
  • 1
  • 1
askmish
  • 6,464
  • 23
  • 42
  • 1
    Sorry, but I do not follow. How do the quotes you list explain that the cast to `long` is safer and more portable? – sleske Aug 19 '14 at 14:47
  • 1
    @sleske: Pointers are always at least 32 bits in size(where GLib would run) and `long` is guaranteed to be atleast 32-bits. So there is no size mismatch on systems where GLib will be running. Whereas `int` is not guaranteed to be 32-bit. – askmish Aug 19 '14 at 14:49
  • @sleske: and more so the portability guarantee is only for integers upto 32-bit. – askmish Aug 19 '14 at 14:55
  • 1
    I'm getting why you cast 42 to long and then to void*. But what is the point of casting first to long and then to int? If p is longer originally, you first truncate it to long, then to int. If p is shorter than int you also win nothing. – mcsim Feb 26 '16 at 15:56
  • @mcsim I can't think of any besides avoiding type size mismatch warnings. – a3f Dec 11 '16 at 00:12
  • I still don't understand the point of casting to long first. What if the pointer is 64 bits and long is 32 bits? – Aykhan Hagverdili May 20 '20 at 11:56
  • 2
    This answer doesn't seem to explain why the code is "safer, more portable", just asserts that it is – M.M May 20 '20 at 20:36
7

As I understand it, the code (void*)(long)42 is "better" than (void*)42 because it gets rid of this warning for gcc:

cast to pointer from integer of different size [-Wint-to-pointer-cast]

on environments where void* and long have the same size, but different from int. According to C99, §6.4.4.1 ¶5:

The type of an integer constant is the first of the corresponding list in which its value can be represented.

Thus, 42 is interpreted as int, had this constant be assigned directly to a void* (when sizeof(void*)!=sizeof(int)), the above warning would pop up, but everyone wants clean compilations. This is the problem (issue?) the Glib doc is pointing to: it happens on some systems.


So, two issues:

  1. Assign integer to pointer of same size
  2. Assign integer to pointer of different size

Curiously enough for me is that, even though both cases have the same status on the C standard and in the gcc implementation notes (see gcc implementation notes), gcc only shows the warning for 2.

On the other hand, it is clear that casting to long is not always the solution (still, on modern ABIs sizeof(void*)==sizeof(long) most of the times), there are many possible combinations depending on the size of int,long,long long and void*, for 64bits architectures and in general. That is why glib developers try to find the matching integer type for pointers and assign glib_gpi_cast and glib_gpui_cast accordingly for the mason build system. Later, these mason variables are used in here to generate those conversion macros the right way (see also this for basic glib types). Eventually, those macros first cast an integer to another integer type of the same size as void* (such conversion conforms to the standard, no warnings) for the target architecture.

This solution to get rid of that warning is arguably a bad design that is nowadys solved by intptr_t and uintptr_t, but it is posible it is there for historical reasons: intptr_t and uintptr_t are available since C99 and Glib started its development earlier in 1998, so they found their own solution to the same problem. It seems that there were some tries to change it:

GLib depends on various parts of a valid C99 toolchain, so it's time to use C99 integer types wherever possible, instead of doing configure-time discovery like it's 1997.

no success however, it seems it never got in the main branch.


In short, as I see it, the original question has changed from why this code is better to why this warning is bad (and is it a good idea to silence it?). The later has been answered somewhere else, but this could also help:

Converting from pointer to integer or vice versa results in code that is not portable and may create unexpected pointers to invalid memory locations.

But, as I said above, this rule doesn't seem to qualify for a warning for issue number 1 above. Maybe someone else could shed some light on this topic.

My guess for the rationale behind this behaviour is that gcc decided to throw a warning whenever the original value is changed in some way, even if subtle. As gcc doc says (emphasis mine):

A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.

So, if sizes match there is no change on the bits (no extension, no truncation, no filling with zeros) and no warning is thrown.

Also, [u]intptr_t is just a typedef of the appropriate qualified integer: it is not justifiable to throw a warning when assigning [u]intptr_t to void* since it is indeed its purpose. If the rule applies to [u]intptr_t, it has to apply to typedefed integer types.

Fusho
  • 1,469
  • 1
  • 10
  • 22
  • And why is there a warning anyway? Why does it disappear when we cast to `long` first? – Aykhan Hagverdili May 26 '20 at 11:00
  • there is a warning if `const` is removed: https://gcc.godbolt.org/z/LMQ83s – Fusho May 26 '20 at 11:13
  • Yes, as I say, I understand it should *not* disappear when casted to `long` first, but `gcc` seem to have taken that approach – Fusho May 26 '20 at 11:16
  • 1
    As this question is tagged as `C`, take the `C` version where even with `const` there is a warning: https://gcc.godbolt.org/z/plcxib – Fusho May 26 '20 at 11:20
  • Clang has the same behavior https://gcc.godbolt.org/z/_prHLY . There's got to be a reason for that? – Aykhan Hagverdili May 26 '20 at 11:20
  • Well, clang follows gcc as far as it can, it all boils down to why gcc does it that way I guess – Fusho May 26 '20 at 11:22
  • @Ayxan: please, check my edit for a possible rationale about that warning – Fusho May 26 '20 at 12:09
  • How come copying sign bits is considered a change in value? That happens when we cast it to long anyway – Aykhan Hagverdili May 26 '20 at 12:51
  • Right, but the cast to `long` **is well defined** in the C standard, not so the cast to `void*` from integer. The later is *implementation defined* – Fusho May 26 '20 at 12:54
  • The fact that gcc does the same bit extension for `(void*)42` as the standard specifies for `(long)42` could be considered just a coincidence, other implementation could do any other thing they want to do – Fusho May 26 '20 at 13:01
  • cast from `long` to `void*` is just as implementation-defined as from `int` to `void*`, so I don't see your point. – Aykhan Hagverdili May 26 '20 at 13:30
  • yes, it is as *implementation-defined*, and that's is why I say it should raise a warning anyway (for me, anything that is not in the standard should raise a warning), **but** gcc decided not to raise it when the types are the same size, why?, my guess is at the end of my answer – Fusho May 26 '20 at 13:33
  • I agree with you. That's probably the reason why the cast is done commonly. I find it unjustified though. – Aykhan Hagverdili May 26 '20 at 13:35
  • @Ayxan: the last paragraph in my new edit could give some further justification – Fusho May 26 '20 at 18:16
6

I think it is because this conversion is implementation-dependendent. It is better to use uintptr_t for this purpose, because it is of the size of pointer type in particular implementation.

DoctorMoisha
  • 1,613
  • 14
  • 25
  • 4
    `size_t` is an unsigned integer type intended to hold the maximum size of an object, not the size of a pointer. `uintptr_t` is an unsigned integer type intended to hold the representation of a pointer, and its existence in the standard shows in itself that `size_t` is not that. – Pascal Cuoq Aug 19 '14 at 11:59
5

As explained in Askmish's answer, the conversion from an integer type to a pointer is implementation defined (see e.g. N1570 6.3.2.3 Pointers §5 §6 and the footnote 67).

The conversion from a pointer to an integer is implementation defined too and if the result cannot be represented in the integer type, the behavior is undefined.

On most general purpose architectures, nowadays, sizeof(int) is less than sizeof(void *), so that even those lines

int n = 42;
void *p = (void *)n;

When compiled with clang or gcc would generate a warning (see e.g. here)

warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Since C99, the header <stdint.h> introduces some optional fixed-sized types. A couple, in particular, should be used here n1570 7.20.1.4 Integer types capable of holding object pointers:

The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

intptr_t  

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

uintptr_t  

These types are optional.

So, while a long may be better than int, to avoid undefined behaviour the most portable (but still implementation defined) way is to use one of those types(1).

Gcc's documentation specifies how the conversion takes place.

4.7 Arrays and Pointers

The result of converting a pointer to an integer or vice versa (C90 6.3.4, C99 and C11 6.3.2.3).

A cast from pointer to integer discards most-significant bits if the pointer representation is larger than the integer type, sign-extends(2) if the pointer representation is smaller than the integer type, otherwise the bits are unchanged.

A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.
[...]
(2) Future versions of GCC may zero-extend, or use a target-defined ptr_extend pattern. Do not rely on sign extension.

Others, well...


The conversions between different integer types (int and intptr_t in this case) are mentioned in n1570 6.3.1.3 Signed and unsigned integers

  1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

  2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

  3. Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.


So, if we start from an int value and the implementation provides an intptr_t type and sizeof(int) <= sizeof(intptr_t) or INTPTR_MIN <= n && n <= INTPTR_MAX, we can safely convert it to an intptr_t and then convert it back.

That intptr_t can be converted to a void * and then converted back to the same (1)(2) intptr_t value.

The same doesn't hold in general for a direct conversion between an int and a void *, even if in the example provided, the value (42) is small enough not to cause undefined behaviour.


I personally find quite debatable the reasons given for those type conversion macros in the linked GLib documentation (emphasis mine)

Many times GLib, GTK+, and other libraries allow you to pass "user data" to a callback, in the form of a void pointer. From time to time you want to pass an integer instead of a pointer. You could allocate an integer [...] But this is inconvenient, and it's annoying to have to free the memory at some later time.

Pointers are always at least 32 bits in size (on all platforms GLib intends to support). Thus you can store at least 32-bit integer values in a pointer value.

I'll let the reader decide whether their approach makes more sense than a simple

#include <stdio.h>

void f(void *ptr)
{
    int n = *(int *)ptr;
    //      ^ Yes, here you may "pay" the indirection
    printf("%d\n", n);
}

int main(void)
{
    int n = 42;

    f((void *)&n);
}

(1) I'd like to quote a passage in this Steve Jessop's answer about those types

Take this to mean what it says. It doesn't say anything about size.
uintptr_t might be the same size as a void*. It might be larger. It could conceivably be smaller, although such a C++ implementation approaches perverse. For example on some hypothetical platform where void* is 32 bits, but only 24 bits of virtual address space are used, you could have a 24-bit uintptr_t which satisfies the requirement. I don't know why an implementation would do that, but the standard permits it.

(2) Actually, the standard explicitly mention the void* -> intptr_t/uintptr_t -> void* conversion, requiring those pointers to compare equal. It doesn't explicitly mandate that in the case intptr_t -> void* -> intptr_t the two integer values compare equal. It just mention in footnote 67 that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.".

Bob__
  • 12,361
  • 3
  • 28
  • 42
  • 1
    Your answer doesn't mention anything at all about casting to `long` or some other integer type right before casting to a pointer type? That is the whole question. If I am going to cast it to a pointer type anyway, what's the point of cast to an integer type? How is `(void*)(intptr_t)42` better than `(void*)42`? – Aykhan Hagverdili May 20 '20 at 16:43
  • @Ayxan In my snippet `n` (an `int`) is casted to `intptr_t` (which is an *integer* type) then to `void *` when is passed to `g` (which you may assume is one of the GLib functions). Inside `g`, the pointer is casted again to `intptr_t` before beeing stored inside the `int`. You can cast the `int` to another integer type because it's a well defined operation (If bits have to be added, the correct ones will be) when the size of the other type is bigger. You should use `intptr_t`/`uintptr_t` because it's the only integer type that is guaranted to convert to a `void *` and back. – Bob__ May 20 '20 at 17:03
  • So you are saying we cast it to `intptr_t` first, which is well defined, then we cast it to `void*` which is specifically well defined because of the part of the standard you quoted, so the whole process is well defined, unlike if we did int to pointer directly? – Aykhan Hagverdili May 20 '20 at 17:09
  • If that's your point, then it makes sense to cast to `intptr_t` first to be technically well defined. In that case casting to `long` doesn't make any sense at all. – Aykhan Hagverdili May 20 '20 at 17:10
  • Actually I'd say implementation defined, because the standard doesn't specify the sizes (neither their absolute values nor the relation between) of `int` and `void *`. – Bob__ May 20 '20 at 17:13
  • If it's still implementation-defined, what do we have to gain then? – Aykhan Hagverdili May 20 '20 at 17:15
  • @Ayxan That's why I added the static assertion. If you are asking if it's a portable and advisable method, again, that's a big no IMHO. It sounds like a bad designed API. There are other ways to embed information into pointers (in the lower or higher bits), used e.g. in some small string optimizations, but even those have portability issues. – Bob__ May 20 '20 at 17:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214277/discussion-between-ayxan-and-bob). – Aykhan Hagverdili May 20 '20 at 17:30