2

My best-effort reading of the C specification (C99, primarily) makes me think that it is valid to cast (or implicitly convert, where void *'s implicit conversion behavior applies), between any of these types:

void *, char *, signed char *, unsigned char *

I expect that this will trigger no undefined behavior, and that those pointers are guaranteed to have the same underlying representation.

Consequently, it should be possible to take a pointer of either one of those four types that is already pointing to an address which can be legally dereferenced, typecast and/or assign it to one of the three char type pointers, and dereference it to access the same memory, with the only difference being whether your code will treat the data at that location as a char, signed char, or unsigned char.

Is this correct? Is there any version of the C standard (lack of void * type in pre-standardization C not withstanding) where this is not true?

P.S. I believe that this question is answered piecemeal in passing in a lot of other questions, but I've never seen a single clear answer where this is explicitly stated/confirmed.

mtraceur
  • 3,254
  • 24
  • 33
  • 2
    Actually, you *shouldn't* cast `void *`, just because it is implicitly convertible from and to other pointer types. In fact casting a `void *` can lead to subtle bugs. See e.g. [this question and answers](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) for more details. – Some programmer dude Sep 15 '15 at 05:35
  • @JoachimPileborg Thank you. I tried my best to word the question in a way that drew attention to the fact that `void *` indeed has implicit conversion behavior (see first sentence), but I wasn't aware that there were any issues with explicitly casting `void *`. – mtraceur Sep 15 '15 at 05:38
  • @JoachimPileborg Follow-up: Okay, I've read that question and a bunch of the top answers you linked to, and I see that the subtle bugs in question aren't technically-wrong-use-of-the-language-to-explicitly-cast-a-void-pointer sorts of bugs, but rather bugs that happen because an explicit cast of a void pointer can hide other issues that more readily surface without the explicit cast. Would you say that is correct? – mtraceur Sep 15 '15 at 05:46
  • Something like that yeah. :) – Some programmer dude Sep 15 '15 at 05:48
  • 1
    @mtraceur See Deduplicator's answer to [my similar question](http://stackoverflow.com/a/24727105/539810) for some more information. I was asking about copying the representation of an object, but the answer is still relevant. Paragraph 6 quoted in the answer is primarily talking about conversions from `void *` to another pointer type. As Deduplicator mentioned in the comments below that answer, beware of alignment issues. –  Sep 15 '15 at 06:05
  • @ChronoKitsune Thanks. That does indeed answer at least part of the question. Similarly, so does [this question's accepted answer](http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior). And this all goes in line with my reading of the C99 standard: I just wanted to get confirmation from other folks on here who clearly know the standard much better than I do, and get a clear answer to this exact facet of the language in one question, as opposed to scattered about implicitly and partially in many other questions. – mtraceur Sep 15 '15 at 06:39
  • 1
    With the exception of the case where the input pointer does not point to data (and it is not NULL) yes, that's safe. but I don't feel like crawling through the standards docs to find proof. – Jasen Sep 15 '15 at 07:37
  • @Jasen Thanks. I will edit to exclude the cases of pointers that would otherwise be undefined/illegal to dereference anyway. – mtraceur Sep 15 '15 at 07:44
  • 1
    I was thinking more on the case of pointer to functions where code is in a separate memory area. – Jasen Sep 15 '15 at 07:48
  • @Jasen I see. Yeah, I guess my wording does not exclude that possibility clearly enough. I'll work on figuring out a better wording, but you're welcome to suggest an edit that you think is clearer. – mtraceur Sep 15 '15 at 07:51
  • 1
    some info here too, http://stackoverflow.com/questions/30535814/pass-unsigned-char-pointer-to-atoi-without-cast – Giorgi Moniava Sep 15 '15 at 07:52
  • @Giorgi Interesting: So it seems that even though the pointers to character types are, near as I can tell, guaranteed to have identical representation to `void *`, and be interchangeable with the appropriate cast, there does not seem to be any prohibition about a compiler making a fuss about it, which is certainly worth knowing. – mtraceur Sep 15 '15 at 08:00
  • @Giorgi (cont) I mean "interchangeable" as far as still validly pointing to the same memory - obviously the logic of the code changes somewhat depending on whether you're dereferencing through a signed vs unsigned character type pointer. – mtraceur Sep 15 '15 at 08:03
  • @mtraceur: Yes I think mixing them is tricky, read answer by K. Thompson. Other answer is wrong I think; here too: http://stackoverflow.com/questions/24767522/passing-unsigned-char-array-to-string-functions?lq=1 – Giorgi Moniava Sep 15 '15 at 08:03
  • 1
    "(lack of void * type in the pre-ISO C specification not withstanding)" - C89 had `void *` – M.M Feb 01 '16 at 02:31
  • @M.M Thanks for pointing that out. When I wrote up this question, I must've been mixed up or misinformed about when exactly the `void *` type was added, or exactly which C specs were ISO vs. ANSI. I've edited my question to say "pre-standardization C" instead of "pre-ISO C specification", which I think more accurately reflects the history. – mtraceur Feb 02 '16 at 05:40

1 Answers1

1

Consequently, it should be possible to take a pointer of either one of those four types that is already pointing to an address which can be legally dereferenced, typecast and/or assign it to one of the three char type pointers, and dereference it to access the same memory, with the only difference being whether your code will treat the data at that location as a char, signed char, or unsigned char.

This is correct. In fact you could take a valid pointer to an object of any type and convert it to some of those three and access the memory.

You correctly mention the provision about void * and char * etc. having the same representation and alignment requirements, but that actually does not matter. That refers to the properties of the pointer itself, not the properties of the objects being pointed to.

The strict aliasing rule is not violated because that contains an explicit provision that a character type may be used to read or write any object.

Note that if we have for example, signed char ch = -2;, or any other negative value, then (unsigned char)ch may differ from *(unsigned char *)&ch. On a system with 8-bit characters, the former is guaranteed to be 254 but the latter could be 254, 253, or 130 depending on the numbering system in use.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Took me a little bit to remember why the last paragraph was true: because the conversion rules (6.3.1 in C99, specifically 6.3.1.3.2) will make integral types behave as-if they were two's-complement in that case, whereas access through a pointer will expose the raw representation. That's definitely a good caveat to bring up. Anyway, this answer earns my +1 and accept. Do you think this question+answer would be more helpful to others with more explicit explicit references to sections of the standard? If so, I'll try to find the time to dig them up and recommend edits to this answer with them. – mtraceur Feb 02 '16 at 06:07
  • @mtraceur up to you; I find it easier to read something if it's not broken up by many standard quotes ; and the relevant parts are easy enough to find anyway. – M.M Feb 02 '16 at 06:45