43

As a newcomer to C, I'm confused about when casting a pointer is actually OK.

As I understand, you can pretty much cast any pointer type to any other type, and the compiler will let you do it. For example:

int a = 5;
int* intPtr = &a;
char* charPtr = (char*) intPtr; 

However, in general this invokes undefined behavior (though it happens to work on many platforms). This said, there seem to be some exceptions:

  • you can cast to and from void* freely (?)
  • you can cast to and from char* freely (?)

(at least I've seen it in code...).

So which casts between pointer types are not undefined behaviour in C?

Edit:

I tried looking into the C standard (section "6.3.2.3 Pointers", at http://c0x.coding-guidelines.com/6.3.2.3.html ), but didn't really understand it, apart from the bit about void*.

Edit2:

Just for clarification: I'm explicitly only asking about "normal" pointers, i.e. not about function pointers. I realize that the rules for casting function pointers are very restrictive. As I matter of fact, I've already asked about that :-): What happens if I cast a function pointer, changing the number of parameters

jww
  • 97,681
  • 90
  • 411
  • 885
sleske
  • 81,358
  • 34
  • 189
  • 227

4 Answers4

38

Basically:

  • a T * may be freely converted to a void * and back again (where T * is not a function pointer), and you will get the original pointer.
  • a T * may be freely converted to a U * and back again (where T * and U * are not function pointers), and you will get the original pointer if the alignment requirements are the same. If not, the behaviour is undefined.
  • a function-pointer may be freely converted to any other function-pointer type and back again, and you will get the original pointer.

Note: T * (for non-function-pointers) always satisfies the alignment requirements for char *.

Important: None of these rules says anything about what happens if you convert, say, a T * to a U * and then try to dereference it. That's a whole different area of the standard.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • 3
    In addition to point two, if the alignment requirements are the same, a cast back from U* to T* will compare equal to the original T*. – James Greenhalgh Jan 26 '11 at 22:03
  • Isn't that second point, exactly the same as referring to the size of two types? – atx Jan 26 '11 at 22:03
  • @malfy: Not exactly. In theory, the compiler is free to enforce any exotic alignment requirements it likes. – Oliver Charlesworth Jan 26 '11 at 22:06
  • @Oli: On standard x86/x86-64 Linux and Windows systems, these alignments are always the same as the size of the type specified in the C standard though. And no compiler that I know of (so far), will actually have the size of a type not equal to the alignment. – atx Jan 26 '11 at 22:10
  • You might want to also clarify that `T*` refers to pointers to 'objects' and do not include pointers to functions. – Michael Burr Jan 26 '11 at 22:17
  • 2
    @Oli Charlesworth: About "if the alignment requirements are the same": What is the "alignment requirement", and how am I supposed to know it for a given type? Does the standard say something about it. Is there some operator like `alignment`, like `sizeof`? – sleske Jan 26 '11 at 22:18
  • @sleske: It's simply implementation-defined what the alignment requirements are, I'm afraid. – Oliver Charlesworth Jan 26 '11 at 22:22
  • @sleske: That was my exact point, they are essentially the same. When he writes about alignment, he refers to the way it is stored in memory. In this scenario, you can just think of the size. – atx Jan 26 '11 at 22:22
  • @malfy: On most practical systems, yes, this is a perfectly reasonable assumption. I can imagine that there are probably systems out there where this isn't the case, though, so if you're on an exotic platform, it's best to keep this in mind, and check. – Oliver Charlesworth Jan 26 '11 at 22:24
  • 3
    @malfy: there's no requirement that alignment be the same as an object's size. An object's size will influence it's alignment requirements, and the alignment of an object will influence the size of the object (in that the size will have to be a multiple of the alignment). But there's nothing that says that an object's size and alignment need to be the same. There are even platforms where alignment isn't a requirement at all. – Michael Burr Jan 26 '11 at 22:29
  • @Oli Charlesworth: I often see people cast to `char` & back again. Is there maybe some rule that `char` does not need any special alignment (it's the byte type, after all)? Or does this just happen to apply on all popular platforms? – sleske Jan 26 '11 at 22:33
  • 6
    @sleske: A `T *` (for any non-function `T`) is guaranteed by design to be correctly aligned for a `char *`. What's more, the standard guarantees that one is allowed to dereference the `char *` and access the underlying data (this isn't true for any other destination type). – Oliver Charlesworth Jan 26 '11 at 22:35
  • 1
    @Oli Charlesworth: Ah, I see, so `char` is special in that it does not require any particular alignment, unlike some other types. Maybe you could include that into your answer? – sleske Jan 26 '11 at 22:38
  • @malfy the C standard does not actually say anything about the specific sizes of objects, i.e. you do not actually know whether an int is 16, 32, 64, 128, 13, or even 39273 bit. It could literally be anything. Only requirement by the standard is that certain types are bigger than others, e.g. sizeof(short) <= sizeof(int) <= sizeof(long). – wich Jan 26 '11 at 22:46
  • 1
    @sleske: It's really just a special case of the second point in my answer. – Oliver Charlesworth Jan 26 '11 at 22:51
  • 5
    @sleske: Due to the way arrays and object sizes interact (the size of a type is equal to the spacing of that type within an array), required alignment of a type must evenly divide the size of a type. Since the size of `char` is fixed as 1, the alignment of `char` cannot be anything else but 1 also. – caf Jan 27 '11 at 04:57
  • @caf: Thanks for the explanation. I took the liberty of editing some info into the original answer. – sleske Jan 27 '11 at 13:30
  • @sleske: No offence intended, but I've undone that edit. I prefer my version, because it succinctly summarises the contents of 6.3.2.3, without duplication. – Oliver Charlesworth Jan 27 '11 at 13:45
  • 2
    @Oli: Well, it's your answer :-). I'd appreciate some mention of the fact that T*->char*->T* always works. It's true that it logically follows from point two, but I feel that that is far from obvious, yet important, because `char*` is often used as a "generic pointer". An answer should be as concise as possible, but no more than that ;-) (with apologies to Einstein). Maybe you could add some note? – sleske Jan 27 '11 at 13:52
  • 1
    @OliverCharlesworth: what's the point in casting if I don't dereference it? Also I thought you could cast to void anything, but 2nd bullet here, says otherwise? http://spin.atomicobject.com/2014/05/19/c-undefined-behaviors/ – Giorgi Moniava Feb 23 '15 at 19:40
  • @Giorgi: You may need to cast to `void*` and back to transport a pointer in a `struct` that is used for holding different pointer types (to avoid having a `union` listing every expected type). – sleske Aug 21 '15 at 09:23
  • @Giorgi: About that 2nd bullet, I think you mixed that up: You can cast _anything to void_, but not vice versa (see 2nd bullet point of this answer). – sleske Aug 21 '15 at 09:24
  • A question about the 2nd bullet. If I cast `uint8_t*` to `uint32_t*` **and** I know that the value of the original `uint8_t*` is a multiple of 4, is this cast then well defined, or would the cast still cause undefined behaviour because it violates the 2nd bullet? – mercury0114 Nov 08 '20 at 14:19
10

Oli Charlesworth's excellent answer lists all cases where casting a pointer to a pointer of a different type gives a well-defined result.

In addition, there are four cases where casting a pointer gives implementation-defined results:

  • You can cast a pointer to an sufficiently large (!) integer type. C99 has the optional types intptr_t and uintptr_t for this purpose. The result is implementation-defined. On platforms that address memory as a contiguous stream of bytes ("linear memory model", used by most modern platforms), it usually returns the numeric value of the memory address the pointer points to, thus simply a byte count. However, not all platforms use a linear memory model, which is why this is implementation-defined :-).
  • Conversely, you can cast an integer to a pointer. If the integer has a type large enough for intptr_t or uintptr_t and was created by casting a pointer, casting it back to the same pointer type will give you back that pointer (which however may no longer be valid). Otherwise the result is implementation-defined. Note that actually dereferencing the pointer (as opposed to just reading its value) may still be UB.
  • You can cast a pointer to any object to char*. Then the result points to the lowest addressed byte of the object, and you can read the remaining bytes of the object by incrementing the pointer, up to the object's size. Of course, which values you actually get is again implementation-defined...
  • You can freely cast null pointers, they'll always stay null pointers regardless of pointer type :-).

Source: C99 standard, sections 6.3.2.3 "Pointers", and 7.18.1.4 "Integer types capable of holding object pointers".

As far as I can tell, all other casts of a pointer to a pointer of a different type are undefined behavior. In particular, if you are not casting to char or a sufficiently large integer type, it may always be UB to cast a pointer to a different pointer type - even without dereferencing it.

This is because the types may have different alignment, and there is no general, portable way to make sure different types have compatible alignment (except for some special cases, such as signed/unsigned integer type pairs).

sleske
  • 81,358
  • 34
  • 189
  • 227
  • I think it might be better to start with the second point (conversion from pointer to integer) and then specify that if a pointer-to-integer conversion has ever yielded a particular integer value, and the object identified by the pointer at the of the conversion yielded that value is still valid, then casting that particular integer value to a pointer is defined by the standard to yield a pointer equivalent to the original. I don't think implementations are required to promise any behaviors beyond that. – supercat May 06 '15 at 23:17
  • 1
    @supercat: Thanks for the suggestion. Actually, I think C99 is even stricter: The roundtrip pointer->int->pointer is only guaranteed to give back the same pointer if the integer type used is `(u)intptr_t´ (C99, 7.18.1.4 "Integer types capable of holding object pointers") - you can't just use any sufficiently large int type. I edited my post. – sleske May 08 '15 at 14:57
  • If type `intptr_t` exists [its existence is optional] then an integer type is capable of holding all the values which `intptr_t` could hold, I would expect a conversion from a pointer to that type would be processed as equivalent to a conversion to `intptr_t` followed by a conversion to that other type. Such behavior (the fact that any size integer can be coerced to any other) would be the only basis for `unsigned short x = somePointer;` having any meaning on machines where `unsigned short` can't hold a pointer. – supercat May 08 '15 at 15:05
  • As it is, while I can't see much *use* for `short x=somePointer;` it's meaning is established by the standard. Given `short x1=p1,x2=p2; int result = (x1==x2);`, the result would be the same as would be computed by `uintptr_t u1=p1,u2=p2; int result = ((short)u1==(short)u2);`. – supercat May 08 '15 at 15:08
  • @supercat: Yes, that seems correct. I edited again :-). Hope my text is right now. – sleske May 08 '15 at 15:13
  • The ability to convert a pointer to a numeric type does not imply a linear address space. On 8086, a pointer could be converted to an unsigned long whose value was `(segmentPart)*65536ul+offsetPart`; some other machine use even more "interesting" conventions. Also, while `(char*)someArbitraryValue;` might merely yield Implementation-Defined behavior when `someArbitraryValue`, rvalue conversion of an invalid pointer, as would be necessary in an assignment statement, would produce Undefined Behavior whether or not any attempt is made to reference the pointer in question. – supercat May 08 '15 at 15:17
  • @supercat: Yes, I know. The "linear address space" remark was only meant as an illustration, not as a requirement. Edited again. – sleske May 08 '15 at 15:21
  • Personally I don't like the rule that makes all forms of rvalue conversion involving invalid pointers (including dead pointers) UB, and would favor one that limited UB to those cases in which a pointer either had never been valid, would be dereferenced, or would have an integer added/subtracted to/from it. As it is, though, given `char *p,*q; uintptr_t u; ... u=(uintptr_t)p; free(p);` the standard states that `q=p;` would be undefined behavior and there's no reason to believe that `q=(char*)u;` would not be as well. – supercat May 08 '15 at 15:22
  • I wonder if it might be good to offer a realistic example of a platform where a "linear-mapping" assumption would fail: hardware uses bits 0-23 of an address to select one of 16,777,216 32-bit words of memory; the compiler uses 30-31 of a `char*` to select one of the bits within a byte. – supercat May 08 '15 at 15:41
5

Generally, if as usual nowadays the pointers themselves have the same alignment properties, the problem is not the cast itself, but whether or not you may access the data through the pointer.

Casting any type T* to void* and back is guaranteed for any object type T: this is guaranteed to give you exactly the same pointer back. void* is the catch all object pointer type.

For other casts between object types there is no guarantee, accessing an object through such a pointer may cause all sorts of problems, such as alignments (bus error), trap representations of integers. Different pointer types are not even guaranteed to have the same width, so theoretically you might even loose information.

One cast that should always work, though, is to (unsigned char*). Through such a pointer you may then investigate the individual bytes of your object.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • 1
    Note that casting from one pointer type to another pointer type might result in undefined behavior even if you don't attempt to dereference the pointer (if the pointer would be of an incorrect alignment for the destination type). – Michael Burr Jan 26 '11 at 22:35
  • 1
    There's no need for the `(void *)` in `(unsigned char *)(void *)`. – R.. GitHub STOP HELPING ICE Jan 27 '11 at 00:10
  • AFAIK, casting a pointer to a different type and then dereferencing it is *always* UB, or at least implementation-defined (except maybe if the two types are just `typedefs`for the same type). – sleske Jan 27 '11 at 14:29
  • 1
    @sleske: no, not always. The case of casting to character type pointers is explicitly foreseen in the standard. As far as I am able to read this through the lines accessing through a pointer of different type is only UB *if* the target memory can not be interpreted as a valid object of the target type. Since this can never happen for `unsigned char` this is ok. (I find the standard particularly unclear about alignment issues, do they mean the alignment of the pointer type, or of the pointed-to type?) – Jens Gustedt Jan 27 '11 at 16:07
  • @Jens Gustedt: Yes, casting pointer -> `char` is not UB. However, what you get when using the resulting pointer is implementation-defined. Hence my "UB, or at least implementation-defined". – sleske Jan 27 '11 at 16:15
  • 2
    @sleske: I didn't say `char` I said `unsigned char`. For `signed char` (and so for `char` if it is signed) this is special because of the possible problems with trap representations for the type. But by definition in the standard `unsigned char` never has trap representations or padding bits, and so accessing the individual bytes as `unsigned char` is always well defined and there is no room for interpretation. Access through `unsigned char` to the components of an object is foreseen as such at several places of the standard. – Jens Gustedt Jan 27 '11 at 16:22
  • 1
    @Jens Gusted: Yes, that's correct (sorry about the signed/unsigned thing, my mistake). My point is that while you certainly can access the components of an object as bytes by casting to `unsigned char*`, the *values you get* for the components are implementation-defined. You can cast e.g. `double*` to `char*`, but the values you get by dereferencing the result will depend on the platform's represenation of FP values (hence implementation-defined). Or am I missing something? – sleske Jan 27 '11 at 16:38
  • 1
    @sleske, in that sense, yes, it is implementation defined. You had it sound as if that would be negative thing :) I think here this is positive since it lets you detect they way how a double is stored on your machine, if you ever wanted to know. – Jens Gustedt Jan 27 '11 at 17:09
  • @Jens: Yes, that's what I meant. And it's negative if you want to write cross-platform code, and positive if you want to use the specific properties of a given platform. So it depends :-). – sleske Jan 27 '11 at 17:15
  • @sleske: Out of curiosity, if one accesses the bytes of a pointer (e.g. by casting a char** into a char* and dereferencing it) is there any guarantee that the resulting values will be part of the 'normal' range of integers? One must be able to copy byte values verbatim (for things like memcpy to work) but is there any guarantee that any other type of arithmetic operation will work on them? Could one legally have an implementation in which every addressable memory location was large enough to hold an 'long' or a pointer, plus a flag which indicated which it held? – supercat Mar 21 '11 at 19:55
  • @sleske: One of the common difficulties with designing a garbage-collector for C is that it's possible to decompose a pointer into a series of bytes, perform an arbitrary transform on them, and later reverse the transform and reconstitute the pointer. Is there anything which requires that such a thing has to actually work? – supercat Mar 21 '11 at 19:57
  • @supercat: Sorry, I'm not quite sure I understand your question, and I don't think I could answer it. Why don't you just ask it as a new question? – sleske Mar 21 '11 at 21:57
  • @gio, no, you seem to be confusing the direction of the cast. Casting something from `void*` that represents an object that is byte aligned to a type that needs stronger alignement is undefined. If you first cast your type `T*` to `void*` and then back again, this can never happen, and in fact the standard guarantees that this is ok. – Jens Gustedt Feb 23 '15 at 21:36
  • Casting to another type other than a character type is never ok, even if by coincidence the alignment is the same. And yes, SomeType* --> void * --> SomeType* is always ok. – Jens Gustedt Feb 24 '15 at 16:31
0

The authors of the Standard made no attempt to weigh the costs and benefits of supporting conversions among most combinations of pointer types on platforms where such support would be expensive, since:

  1. Most platforms where such conversions would be expensive would likely have been obscure ones the authors of the Standard didn't know about.

  2. People using such platforms would be better placed than the authors of the Standard with the costs and benefits of such support.

If some particular platform uses a different representation for int* and double*, I think the Standard would deliberately allow for the possibility that e.g. round-drip conversion from double* to int* and back to double* would work consistently but conversions from int* to double* and back to int* might fail.

I don't think the authors of the Standard intended that such operations might fail on platforms where such conversions cost nothing. They described the Spirit of C in the charter and rationale documents as including the principle "Don't prevent [or needlessly obstruct] the programmer from doing what needs to be done." Given that principle, there would be no need for the Standard to mandate that implementations process actions in a way that helps programmers accomplish what they need to do in cases where doing so would cost nothing, since implementations that make a bona fide effort to uphold the Spirit of C will behave in such fashion with or without a mandate.

supercat
  • 77,689
  • 9
  • 166
  • 211