0

I have seen that NULL is the equivalent of (void*)0. But I don't understand why zero needs to be typecasted to void*. And what is really happening under the hood when we do something like this

int *p = (int*)10;

Is (int*) extending the address space of integer 10(which take 4 bytes) to 8 bytes in the above statement?

arrowd
  • 33,231
  • 8
  • 79
  • 110
babybob
  • 444
  • 6
  • 22
  • Except for a literal `0`, casting an integer to a pointer is implementation-dependent. – Barmar Jun 20 '17 at 18:29
  • No, it's saying the 10th address available to the program. Chances are, it's pointing at garbage or something you do not want to mess with though. – Patrick Roberts Jun 20 '17 at 18:29
  • 4
    @PatrickRoberts: That's wrong! – too honest for this site Jun 20 '17 at 18:32
  • @Olaf not really.. assuming your program doesn't segfault when you dereference `p`, it would attempt to read the data at the 10th address of the program as an `int`. [Check out this wizardry that uses this mechanic](https://codegolf.stackexchange.com/a/122136/42091). In that case a multi-character `char` constant is being implicitly converted to a `char *` but it's the same concept, i.e. the literal value becomes the address location. – Patrick Roberts Jun 20 '17 at 18:55
  • 1
    You are not going to get a good beginner's explanation of pointers on this site, because it wouldn't fit into the answer box, even if anyone wanted to write one. At best you're going to get a hyper-pedantic answer that leaves you more confused than you already were. Go to your friendly local public library and ask the librarian for an introductory C textbook. It doesn't matter which one. – zwol Jun 20 '17 at 19:00
  • 2
    @PatrickRoberts: Please provide a referrence to the standard. Conversion of an integer to a pointer is implementation defined. Typically it is the absolute address, not relative to the program (which section? `.data`? `.bss`? `.text`?) And show the definition of "garbage" there is nothing like that in the standard. Actually dereferencing a pointer which does not point to an object invokes UB. That's all we can assume from the sparse information given. – too honest for this site Jun 20 '17 at 19:02
  • @PatrickRoberts _You_ get the pedantic answer: The effect of `int *p = (int *)10;` is implementation-defined, which means it will do _something_ predictable on each implementation you try it on -- it's not allowed to trigger the nasal demons -- now, if you *dereference* that pointer, all bets are off -- but there is no expectation that it will be the same thing on any two implementations. – zwol Jun 20 '17 at 19:04
  • 1
    @PatrickRoberts: The linked code golf invokes UB and is plain rubbish code. Writing such code is a good reason to get instantly fired. It is undebugable. Don't even think about writing such code for a program you intend to use practically. – too honest for this site Jun 20 '17 at 19:06
  • @Olaf I never said it was a standard, you inferred that. And sure it's implementation defined, but in at least two compilers on multiple linux platforms, it follows the behavior I described. Anyone with even a little C experience would know not to _actually_ do this, but I personally find it more interesting to observe what some implementations do with this in practice rather than just saying "oh, it's implementation-dependent / UB" – Patrick Roberts Jun 20 '17 at 19:06
  • 2
    @zwol: Good statement, but "It doesn't matter which one" is problematic. 1) chaces are gfood the library has K&R-C Rev. **1** on the shelf. 2) There are also newer books which teach bad practice and are plain wron in some aspect. Especially when it comes to pointers, etc. there is a lot of "it works for me, so it is correct" voodoo. – too honest for this site Jun 20 '17 at 19:08
  • 1
    @olaf it says that NULL evaluates to a null pointer constant. It also says that (void*)0 is a null pointer constant. So "NULL is equivalent to (void*)0" is a reasonable/accurate statement, other than superficial lexical considerations. – Oliver Charlesworth Jun 20 '17 at 19:13
  • 1
    @OliverCharlesworth: Not really. The `NULL` macro can be as well `#define NULL 0` - perfectly valid. Or it could be a compiler-internal token/builtin. I'll already don't mention the more subtle differences. – too honest for this site Jun 20 '17 at 19:15
  • 1
    Yes, really, in the sense that most people would interpret "equivalent" in this context (pointers), as opposed to "identical", say. – Oliver Charlesworth Jun 20 '17 at 19:20
  • 1
    The expression `(int*)10` is meaningless in almost all cases, and it is not valid to evaluate such a pointer. It would be Undefined Behaviour to dereference such a pointer. Unless you are using a compiler that explicitly tells you this is okay and what it means, it has no valid meaning and should be avoided. In other words, only do this if you know exactly what it does (which varies based on the system) – Justin Jun 20 '17 at 19:37
  • 1
    @Olaf, I believe an int has to be 16 bits at minimum, so it can't really fit in a byte in the sense that the word is nowadays used. – ilkkachu Jun 20 '17 at 20:00
  • 1
    @ilkkachu: That's wrong. A byte never has nor does it imply 8 bits. That's the reason all networking RFCs and other documents use the term "octet". The C standard also does not require a byte to have 8 bits. That's why `CHAR_BIT` exists. And there are still quite some systems with 16, 24 or 32 bits per byte. Please keep in mind code like the one in the question is typically used on embedded systems, such architectures are typically. POSIX systems OTOH require `CHAR_BIT == 1`, but on these dereferencing `(int *)10` definitively invokes UB in user-space (and most likely in kernel space, too). – too honest for this site Jun 20 '17 at 20:11
  • 1
    @Olaf Posix mandates CHAR_BIT == 8, not 1. UINT_MAX is required to be at least 2^16 - 1. So, an unsigned requires at least 16 bits in its representation. I agree with your point about bytes not being universally defined as 8 bits, although that is the most common meaning. – jschultz410 Jun 20 '17 at 20:24
  • @jschultz410: You did notrice that was a typo, did you? `2 ^ 16 == 18`, btw. I agree `byte == octet` is quite common. So are many missconcceptions, e.g. "the climate does not change", "earth is a disc", "atoms are the smallest particles". Draw your own conclusions … – too honest for this site Jun 20 '17 at 20:30
  • 1
    @Olaf Yes, I did "notrice." That's why I posted a correction. No, I didn't mean XOR, I meant exponentiation. My free form comment is not compilable C code, but it does at least contain correct information. – jschultz410 Jun 20 '17 at 20:38
  • 1
    In C null is not "equavalent" to `(void *) 0`. `(void *) 0` is just one possible way to define null pointer constant. You can also define it as plain `0`. So, it does not "need to be typecasted to `void *`" as you seem to incorrectly believe. – AnT stands with Russia Jun 20 '17 at 20:51
  • 1
    @BabyboBNukes NULL is often defined as (void*) 0 so that if you compare NULL to an integer or a floating point, for example, you will (usually) get a type mismatch warning that you are probably doing something wrong. If/when NULL is simply defined as 0, then you would likely not get any warning if you compared NULL against number types. – jschultz410 Jun 20 '17 at 20:56

2 Answers2

4

There are a couple of ways of answering this.

We say that a pointer value is the address of a memory location. But different computers have used different addressing schemes for memory. C is a higher-level language, portable across many kinds of computers. C does not mandate a particular memory architecture. As far as the C programming language is concerned, memory addresses could literally be things like "123 Fourth Ave.", and it's hard to imagine converting back and forth between an integer and an address like that.

Now, for any machine you're likely to use, memory is actually linearly addressed in a reasonably straightforward and unsurprising way. If your program has 1,000 bytes of memory available to it, the addresses of those bytes might range from 0 up to 999. So if you say

char *cp = (char *)10;

you're just setting up a pointer to the byte located at address 10 (or, that is, the 11th byte in your program's address space).

Now, in C, a pointer is not just the raw address of some location in memory. In C, a pointer is also declared to specify what type of data it points to. So if we say

int *ip = (int *)10;

we're setting up a pointer to one int's worth of data located at address 10. It's the same point in memory as cp pointed to, but since it's an int pointer, it's going to access an int's worth of bytes, not one byte like cp did. If we're on an old 16-bit machine, and int is two bytes, we could think of ip as pointing at the fifth int in our address space.

A cast in C can actually do two things: (1) convert a value ("change the bits"), or (2) change the interpretation of a value. If we say float f = (float)3;, we're converting between the integer representation of 3 and a floating-point representation of 3, which is likely to be quite different. If we go in the other direction, with something like int i = (int)3.14;, we're also throwing away the fractional part, so there's even more conversion going on. But if we say int *ip = (int *)10;, we're not really doing anything with the value 10, we're just reinterpreting it as a pointer. And if we say char *cp = (char *)ip, we're again not changing anything, we're just reinterpreting to a different kind of pointer.

I hasten to add, though, that everything I've said here about pointer conversions is (a) very low-level and machine-dependent, and (b) not the sort of thing that ordinary C programmers are supposed to have to think about during ordinary programming tasks, and (c) not guaranteed by the C language.

In particular, even when programming for a computer with a conventional, linearly-addressed memory model, it's likely that your program doesn't have access to address 10, so these pointers (cp and ip) might be pretty useless, might generate exceptions if you try to use them. (Also, when we have a pointer like ip that points at more than 1 byte, there's the question of which bytes it points to. If ip is 10, it probably points at bytes 10 and 11 on a 16-bit, byte-addressed machine, but which of those two bytes is the low-order half of the int and which is the high-order half? It depends on whether it's a "big endian" or "little endian" machine.)

But then we come to null pointers. When you use a constant "0" as a pointer value, things are a little different. If you say

void *p = (void *)0;

you are not, strictly speaking, saying "make p point to address 0". Instead, you are saying "make p be a null pointer". But it turns out this has nothing to do with the cast, it's because of a special case in the language: in a pointer context, the constant 0 represents a null pointer constant.

A null pointer is a special pointer value that's defined to point nowhere. It might be represented internally as a pointer to address 0, or it might be represented some other way. (If it is in fact represented as a pointer to address 0, your compiler will be careful to arrange that there's never any actual data at address 0, so that it's still true that the pointer "points nowhere" even though it points to address 0. This is sort of confusing, sorry about that.)

Although pointers to raw addresses like 10 are low-level and dangerous and machine-dependent, null pointers are well-defined and perfectly fine. For example, when you call malloc and it can't give you the memory you asked for, it returns a null pointer to tell you so. When you test malloc's return value to see if it succeeded or failed, you just check to see if it gave you a null pointer or not, and there's nothing low-level or nonportable or discouraged about doing so.

See http://c-faq.com/null/index.html for much more on all this.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • `char *cp = (int *)10;` - no typo? – too honest for this site Jun 20 '17 at 20:13
  • "… or (2) change the interpretation of a value" - Actually no. It is always a conversion. C has no `reinterpret_cast` like C++ introduced. The typical reinterpretation via pointer-cast invokes undefined behaviour (of which alignment is the lesat of problems) with few exceptions. hence the explicit allowancce to alias via a `union` - because that's the only legal way. Also _null pointers_ are not neccessarily a single value; One alternative would be to set a bit in the pointer (which could help debugging code quite a lot actually). – too honest for this site Jun 20 '17 at 20:58
  • @Olaf On the vast majority of popular machines, `char *cp = (char *)ip` doesn't change any bits when going from `ip` to `cp`, which is of course what I meant by not converting. – Steve Summit Jun 20 '17 at 22:13
  • That still is a conversion: 6.3.2.3p7 and related. Whether bits are changed or not is not relevant. A 1:1 copy is still a conversion. – too honest for this site Jun 20 '17 at 22:16
  • And another typo `int *ip = (int)10;` – user58697 Jun 20 '17 at 23:35
  • @user58697 Thanks, fixed. – Steve Summit Jun 21 '17 at 01:26
-1

A pointer is a short arrangement of bytes. Via a cast, C allows to pretend these bytes represent an integer.

Type* p = ...;
intptr_t i = (intptr_t)p;

This is occasionally (but rarely) useful when you need to pass a pointer to an interface expecting an integer. To recover the pointer, one just reverses the cast.

Type* recovered_p = (Type*)i;

This does not allocate any memory. You can only deference recovered_p if i contain bytes that, if treated as a Type*, references a previously allocated Type value. This means that the following doesn't produce a usable pointer:

int *p = (int*)10;

An example that uses an integer to store a pointer.

typedef void (*Visitor)(intptr_t, ListNode*);

void List_visit(List* list, Visitor visitor, intptr_t arg) {
   for (ListNode* node = list->head; node; node=node->next) {
      visitor(arg, node);
   }
}

void printer(intptr_t arg, ListNode* node) {
   State* state = (intptr_t)arg;
   printf("%*s%s\n", ( state->count++ )*2, "", node->value);
}

int main(void) {
   List* list = ...;
   State* state = ...;
   List_visit(list, printer, (intptr_t)state);
   List_free(list);
   State_free(state);
   return 0;
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Here's another example where it's used: Perl extensions written in C often need to deal with handles/pointers created by C libraries. Those pointers need to be converted to something that exists in Perl land, so they're converted into integers (inside of Perl objects of the extension's class), to be converted by to pointers and used by later calls to the C library. – ikegami Jun 20 '17 at 20:49
  • 1
    @jschultz410, Thanks for the fixes, but please don't impose your [personal style](https://stackoverflow.com/q/6990726/589924) on other people. – ikegami Jun 20 '17 at 20:56
  • 1
    Something to think about (please don't reply, I don't want to start a style war): `int *p, i` - `*p is a declarator, the `*` gramatically **and semantically** belongs to the `p`, not the type specifier. Hence `int* p, i;` would be visually missleading. Although syntactically irrelevant, especially for beginners such details are irritating (actually not only for beginners). – too honest for this site Jun 20 '17 at 21:10
  • @Olaf, I fully agree that declaring more than one var at a time is a bad idea if one is a pointer. – ikegami Jun 21 '17 at 01:03
  • @ikegami: That's like using a horse to pull a car to safe fuel instead of using a carriage (which is more appropriate when using a horse). It is perfectly valid and the common syntax very well supports it visually. – too honest for this site Jun 21 '17 at 01:13
  • @Antti Haapala, When replying to a comment showing the presence of two valid views, it doesn't make sense to point that one of those two views exists. You really should read the top answer of the question I linked. – ikegami Jun 21 '17 at 02:29
  • @ikegami: As you are a high rep user, I don't assume you are missreading my comments. So I assume you are kidding. It is quite obvious what I mean: `int* p` (omitting the second declarator for clarity) is visually missleading as it does not represent the grammatically and semantical structure. It also differs from the normal usage lateron: `*p`. – too honest for this site Jun 21 '17 at 09:52
  • @ikegami: It sems you did not understand my comments. My reasoning is based on the standard. Maybe reading the standard help. – too honest for this site Jun 21 '17 at 14:38
  • @Olaf, Wait, you're saying that `int* p;` alone is visually misleading? Not at all!! There's no possible confusion as to what it allocates. It allocates a pointer to an integer, and that pointer is called `p`. – ikegami Jun 21 '17 at 14:38
  • It's been 20 minutes, and I can't even fathom what other effect the reader could be fooled into thinking `int* p;` does. – ikegami Jun 21 '17 at 15:04
  • Re "*My reasoning is based on the standard*", Your choice of style, you mean. Mine is based on readability. /// Re "*Maybe reading the standard help.*", No, I aware of why you chose to code the way you do. As I pointed out in the comment that elicited your first reply, it's a perfectly legit way of doing things. I just think readability is more important. – ikegami Jun 22 '17 at 06:49