223

In a reputable source about C, the following information is given after discussing the & operator:

... It's a bit unfortunate that the terminology [address of] remains, because it confuses those who don't know what addresses are about, and misleads those who do: thinking about pointers as if they were addresses usually leads to grief...

Other materials I have read (from equally reputable sources, I would say) have always unabashedly referred to pointers and the & operator as giving memory addresses. I would love to keep searching for the actuality of the matter, but it is kind of difficult when reputable sources KIND OF disagree.

Now I am slightly confused--what exactly is a pointer, then, if not a memory address?

P.S.

The author later says: ...I will continue to use the term 'address of' though, because to invent a different one [term] would be even worse.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
d0rmLife
  • 4,112
  • 7
  • 24
  • 33
  • 126
    A pointer is a **variable** that **holds** an address. It also has its *own* address. This is the fundamental difference between a pointer and an array. An array effectively *is* an address (and by implication, its address is *itself*). – WhozCraig Mar 01 '13 at 05:53
  • 3
    @WhozCraig Indeed, but the quote is especially keen to note that `&` doesn't return an actual memory address. So what does it return then? – d0rmLife Mar 01 '13 at 05:54
  • http://programmers.stackexchange.com/questions/17898/whats-a-nice-explanation-for-pointers – Krishnabhadra Mar 01 '13 at 05:56
  • 8
    What's your "reputable source" for the quote? – Cornstalks Mar 01 '13 at 05:59
  • Pointer has no address, pointer is a value of a "pointer variable" that has an address. But this is neat, so we have pointers as pointer values and pointers as pointer variables. Anything else to confuse newbies even more? – exebook Mar 01 '13 at 05:59
  • 3
    @exebook pointer has an address because the pointer is also a variable that is holding an address. – Aniket Inge Mar 01 '13 at 06:00
  • 2
    @exebook Really? pointer has no address? Mkk.`int *p = NULL, **pp = &p;` So much for that. – WhozCraig Mar 01 '13 at 06:00
  • @Cornstalks Mike Banahan, author of "The C Book". Also available @ http://publications.gbdirect.co.uk/c_book/chapter5/pointers.html – d0rmLife Mar 01 '13 at 06:02
  • @WhozCraig how about @NULL? NULL is a pointer as well, but is it a variable? How about (char*)55555? Is this a pointer, does it have an address? – exebook Mar 01 '13 at 06:02
  • @exebook NULL is not a pointer. Its technically not even an address. – WhozCraig Mar 01 '13 at 06:02
  • 1
    @WhozCraig why do they call it "null pointer exception" then? – exebook Mar 01 '13 at 06:06
  • 1
    @d0rmLife Yeah, i think that author needs to step back and climb down from the pillar.His logic for continuing usage of a term he so freshly assailed earlier is laughable. And as far as definition is concerned: *C99 6.5.3.2,p3: The unary & operator yields the **address of** its operand."* The *type* of address returned is specific to the variable to which it is applied, but it is no mistake the standard uses that language. – WhozCraig Mar 01 '13 at 06:06
  • @exebook you're confused. All your examples have an address. Where and how, that's a different discussion. And yes, pointers **have** addresses. – Rad'Val Mar 01 '13 at 06:07
  • 1
    The author could have made a helluva case for confusion between *addresses* and the variables that hold them (i.e. pointers) just by dissecting exebook's comments in this question. – WhozCraig Mar 01 '13 at 06:08
  • why is it that every answer assumes that a memory address is an integer... where does it say that this needs to be true? there is something more profound in a pointer that makes it not an address beyond the fact that it doesn't need to be an integer. – thang Mar 01 '13 at 06:25
  • 23
    The ultimate reputable source is the language standard and not books semi-derived from it and semi-pulled-from-the-author's-butt. I learned it the hard way, making almost every mistake I could and slowly building a mental model of C somewhat close that of described by the standard and then finally replacing said model with the standard's model. – Alexey Frunze Mar 01 '13 at 06:26
  • I wish language standards were written so newbies can learn language using them. Now you can usually only relearn language you already know by reading standards. – exebook Mar 01 '13 at 06:28
  • 9
    @thang People think pointer=integer because it is often so (x86 Linux and Windows "teach" us that), because people love generalizing, because people don't know the language standard well and because they've had little experience with radically different platforms. Those same people are likely to assume that a pointer to data and a pointer to a function can be converted to one another and data can be executed as code and code be accessed as data. While this may be true on von Neuman architectures (with 1 address space), but not necessarily true on Harvard architectures (w/ code & data spaces). – Alexey Frunze Mar 01 '13 at 06:33
  • 1
    @AlexeyFrunze, the question is **why is it that every answer assumes that a memory address is an integer**, and your answer is **People think pointer=integer because it is often**. do you see what's wrong here? are you implicitly assuming that pointer=memory address by Gricean... – thang Mar 01 '13 at 06:35
  • 6
    @exebook Standards are not for newbies (especially, complete ones). They aren't supposed to provide gentle introductions and multitudes of examples. They formally define something, so it can be correctly implemented by professionals. – Alexey Frunze Mar 01 '13 at 06:38
  • 1
    @thang I think I've made my points clear in my answer and in my comments. – Alexey Frunze Mar 01 '13 at 06:40
  • A stack pointer is a pointer too, but doesn't have an address. It really boils to that a pointer is a "concept of referring to something with it's address". – Aki Suihkonen Mar 01 '13 at 10:35
  • 1
    NULL is not a valid address, yet NULL can be the value of a pointer variable. – kutschkem Mar 01 '13 at 13:15
  • 1
    Several people have declared that a pointer is a variable. Others disagree (@Aki - stack pointer; http://en.wikipedia.org/wiki/Pointer_%28computer_programming%29 "a data type"). Can someone cite an authoritative reference? Thanks. – LarsH Mar 01 '13 at 15:03
  • 1
    @WhozCraig, any citation to support your initial comment (in particular, the idea that a pointer must be a variable)? Just trying to learn whether this is a few people's opinion or an established fact. For example, man 3 fopen() says that the function returns a FILE pointer, yet obviously it doesn't return a variable. Is the man page wrong? – LarsH Mar 01 '13 at 19:51
  • @kutschkem NULL is a valid address on some operating systems (or lack of operating systems). Just because the OS you're on will give you an access violation when attempting to read/write NULL doesn't mean it's not a valid address ever. – Zach Mar 01 '13 at 22:02
  • @Zach According to the standard, deferencing NULL is undefined behavior, in C++ at least: http://stackoverflow.com/questions/4364536/c-null-reference. Seems a really bad decision to have 0 as a valid pointer. – CiscoIPPhone Mar 01 '13 at 23:03
  • @CiscoIPPhone When you strip out the operating system, and simply have an address space the size of the number of bits representing all pointer values such as in the Xilinx Microblaze, it doesn't matter where you put things in that space, however NULL can be defined to be 0xFFFFFFFF on that platform if you map nothing to that location in the address space therefore 0 can be a valid pointer as NULL doesn't have to be 0. – DX-MON Mar 02 '13 at 12:43
  • Possible duplicate of [Pointer implementation details in C](http://stackoverflow.com/questions/1352500/pointer-implementation-details-in-c) – Peter Mortensen Mar 05 '13 at 22:15
  • @d0rmLife I think you can now accept an answer. The question's been open for almost two months and has received enough attention. – Alexey Frunze Apr 28 '13 at 06:19
  • "The C++ (and C) notion of array and pointer are direct representations of a machine's notion of memory and addresses, provided with no overhead." ~ B. Stroustrup – G.Rassovsky May 19 '15 at 08:57

24 Answers24

160

The C standard does not define what a pointer is internally and how it works internally. This is intentional so as not to limit the number of platforms, where C can be implemented as a compiled or interpreted language.

A pointer value can be some kind of ID or handle or a combination of several IDs (say hello to x86 segments and offsets) and not necessarily a real memory address. This ID could be anything, even a fixed-size text string. Non-address representations may be especially useful for a C interpreter.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Could explain a little bit about handlers/ID's? I think that will help my understanding. I haven't come across those terms until now! – d0rmLife Mar 01 '13 at 06:07
  • 36
    There's not much to explain. Every variable has its address in memory. But you don't have to store their addresses in pointers to them. Instead you can number your variables from 1 to whatever and store that number in the pointer. That is perfectly legal per the language standard so long as the implementation knows how to transform those numbers into addresses and how to do pointer arithmetic with those numbers and all other things required by the standard. – Alexey Frunze Mar 01 '13 at 06:12
  • sounds nice but how about this then: int i = 555, *p = &i; p++; // this is valid C and valid C++ and will work only if pointers are adresses. – exebook Mar 01 '13 at 06:23
  • 4
    i would like to add that on x86, a memory address consists of a segment selector and an offset, so representing a pointer as segment:offset is still using memory address. – thang Mar 01 '13 at 06:28
  • 1
    @thang That's fine, but such a pair of values is not a simple single integer address anymore. – Alexey Frunze Mar 01 '13 at 06:34
  • 1
    @AlexeyFrunze, it's not an integer, but it's still an address. where do you all of a sudden get "integer address" from? it's also not a potato, but i don't care that it's not a potato. – thang Mar 01 '13 at 06:41
  • 1
    @thang It's an x86 address, yes. – Alexey Frunze Mar 01 '13 at 06:47
  • 1
    @thang how is the selector and offset stored? It's still an integer value right? Although it doesn't *directly* represent an address – Rad'Val Mar 01 '13 at 06:58
  • 3
    @ValentinRadu you may argue that it is two integers because it doesn't express the ordering property of integers. otherwise, you can say that any data is an integer, albeit a very large one. a 5 mb file contains one very big integer... – thang Mar 01 '13 at 07:13
  • 2
    The C standard may not mention what a pointer is, but in reality a pointer is either an integer containing an absolute address, or an integer containing a virtual address, in case the CPU/OS supports that concept. No other implementation of pointers exists, or is likely to ever exist. So to think of pointers as "fuzzy entities" will not be of help to anyone, certainly not to a beginner. – Lundin Mar 01 '13 at 07:37
  • 1
    @Lundin I agree with the last sentence in terms of understanding the concept. However, likely/unlikely is exactly fuzzy and that's the whole point, the pointer is fuzzy per the standard. Obviously, it's not fuzzy in, say, gcc targetting x86 CPUs in 32-bit or 64-bit modes. – Alexey Frunze Mar 01 '13 at 07:50
  • So, who's the higher authority than the C standard that feels that the C standard is wrong along with my restatement of its treatment of pointers? Be ashamed of downvoting. – Alexey Frunze Mar 01 '13 at 07:54
  • 2
    As I wrote in a comment to another answer, there are cases where you should just ignore the C standard, because it is too generic. This is one such example, another example is the implementation of signed numbers. It does you no good in the real world to believe that pointers are "fuzzy entities", nor does it do you any good to go ask for a "sign & magnitute CPU" in your computer store. – Lundin Mar 01 '13 at 08:04
  • 4
    One has to be aware that ISO places restrictions on a standard, saying that it may not favour one existing technique on the market in front of another, it must be impartial and not favour a particular company. So ISO C cannot take sides in little VS big endian, it can not state that two's complement is universal because some hobo computer from the 70s had one's complement, and so on. We humans in the real world outside ISO should maintain a sober approach though. The C standard is not a holy book. – Lundin Mar 01 '13 at 08:05
  • 7
    @Lundin I have no problems ignoring the generic nature of the standard and the inapplicable when I know my platform and my compiler. The original question is generic, however, so you can't ignore the standard when answering it. – Alexey Frunze Mar 01 '13 at 08:15
  • 1
    The OP is obviously a beginner and not a computer scientist busy inventing some new, revolutionary pointer implementation. So for the sake of pedagogy, I believe that the C standard should be ignored. – Lundin Mar 01 '13 at 09:00
  • 2
    @exebook: it's valid, but accessing *p will be undefined behavior. There is nothing in the standard that says *p will then point to anything useful; it's like accessing an out-of-bounds array. – Lie Ryan Mar 01 '13 at 09:10
  • 9
    @Lundin You don't need to be revolutionary or a scientist. Suppose you want to emulate a 32-bit machine on a physical 16-bit machine and you extend your 64KB of RAM to up to 4GB by using disk storage and implement 32-bit pointers as offsets into a huge file. Those pointers aren't real memory addresses. – Alexey Frunze Mar 01 '13 at 09:20
  • @AlexeyFrunze Indeed they aren't, they are either virtual addresses, that behaves in the same manner as far as the C programmer is concerned, or they are non-standard extended addresses, "far pointers". The compiler implementation for such a system will have to make the translation between physical addresses and pointer variables in the C program. Because if the (contents of) pointers don't behave as addresses, then pointer arithmetic and indirect addressing will fail and the C program will turn useless. – Lundin Mar 01 '13 at 09:26
  • @AlexeyFrunze that ID is actually called [Logical address](http://en.wikipedia.org/wiki/Logical_address). Correct? – Grijesh Chauhan Mar 01 '13 at 11:49
  • If it can theoretically be anything, how would pointers arithmetics work? Like `somePointer + 1` or `pointerA - pointerB`, afaik standard says that difference of pointers must be int, no? – Andrey Mar 01 '13 at 17:03
  • @Lundin, other implementations of pointers *do* exist, although they aren't common on popular modern platforms. See http://c-faq.com/null/machexamp.html. – Russell Borogove Mar 02 '13 at 01:55
  • 3
    @Andrey, the compiler has to do object-size fixup when subtracting pointers (it has to give you the difference in number of elements, not number of bytes), so we already know that pointer arithmetic isn't integer arithmetic. – Russell Borogove Mar 02 '13 at 01:56
  • 1
    One point I do not see made yet is that, if you attempt to use pointers as if they were addresses, in ways not defined by the C standard, the compiler may produce strange results when optimization is applied. As an example, suppose you have `struct { int a[4], b; } x;`. Although `b` must be immediately after `a` in the struct, the optimizer may conclude that `*(a+4)` is **not** a reference to `b`, because `*(x.a+4)` is not a valid way to refer to `b`. Therefore, code such as `x.b = 3; *(x.a+4) = 5; printf("%d", x.b);` may print “3” even though it would be “5” if pointers were just addresses. – Eric Postpischil Mar 04 '13 at 16:14
  • 6
    The best example I've ever seen of this was the C implementation for Symbolics Lisp Machines (circa 1990). Each C object was implemented as a Lisp array, and pointers were implemented as a pair of an array and an index. Because of Lisp's array bounds checking, you could never overflow from one object to another. – Barmar Mar 05 '13 at 21:20
  • 2
    Just when you think you know a concept in C pretty well - out comes an answer on SO or in the standards that defy and destroy everything you thought was right. A person can never fully learn C until he has studied the standards. +1 sir, excellent accuracy of understanding and enlightened soul :-) – Aniket Inge Mar 25 '13 at 18:45
  • @EricPostpischil what your example illustrates is that the language doesn't oblige the compiler to correctly identify such abuses, so it may use a value based on static analysis or from a register it used to load x.b without reloading the memory address (unless a and b are volatile) - all that's got absolutely nothing to do with whether "pointers [are] just addresses". – Tony Delroy Apr 11 '14 at 00:34
  • @exebook "sounds nice but how about this then: int i = 555, *p = &i; p++; // this is valid C and valid C++ and will work only if pointers are adresses." I'm a bit late to the party, but this comment is wrong. That will work whatever pointers are. There is no rule that `p++` must add some fixed amount to `p`. It could, for example, figure out the ID of the next object in the array that `p` currently holds the ID to an object in. – David Schwartz Apr 22 '20 at 19:30
  • @AlexeyFrunze While this interpretation makes a lot of sense, I don't understand how such interpretation (that pointer is not an address) work with alignment. C standard defines alignment as "requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address". If pointers aren't thought as entities that hold addresses, alignment doesn't seem to make any sense at all. – cubuspl42 Oct 11 '20 at 15:28
  • @cubuspl42 Alignment may be optional. But we can go deeper even without touching alignment. We are allowed to access every C byte in everything through a pointer to a char (unsigned char is safest). And we're allowed to make a pointer beyond the last array element and the same is allowed with non-arrays. So, that kinda asks for pointers to really be addresses since it would be easy to access adjacent things by doing pointer arithmetic. However, there's no pointer arithmetic on things that aren't part of the same larger thing (e.g. separate variables or malloc()'d regions). – Alexey Frunze Oct 11 '20 at 23:54
  • @cubuspl42 So, addresses need not be true and universal representation of pointers. And like I said earlier, they can really be locations in something else, e.g. a file and not memory. – Alexey Frunze Oct 11 '20 at 23:54
  • @AlexeyFrunze Without bringing the alignment to the table, I'm still convinced by your original explanation. The ability of interpreting every object as a byte array means just that all objects (that are separate by nature) are built of bytes, and you can read their bytes as (unsigned) char even if the object's effective type is not (unsigned) char or an array of (unsigned) chars. They can still live in an abstract set of objects, no addressable memory involved. Pointers can point to these objects; if they're arrays, you can iterate that array with a pointer. – cubuspl42 Oct 13 '20 at 16:31
  • You can search the standard for the word "address". In my copy (n1570 draft), there're just 79 occurrences. More or less half of it are uses in the context of alignment (and those uses imply that objects live in an addressable linear memory built of bytes, and those addresses can be a multiply of something) and the other half uses the word "address" as a synonym to "pointer" or something that a pointer holds. – cubuspl42 Oct 13 '20 at 16:40
  • So, basically, I'm just saying that I like your answer and the explanation it contains, because standard doesn't say pointers per se are addresses. _But_ in context of alignment, standard does say (or at least strongly imply) that "address" = "linear memory address" and then in other parts effectively state that "address" = "pointer". So, effectively, "pointer" = "memory address". – cubuspl42 Oct 13 '20 at 16:51
  • I don't mean that "memory" has to mean assembly level address to a virtual or physical memory / RAM. Of course it still doesn't rule out a file-based implementation, or an interpreter. But, at least for C11, I believe you'll be right to think that a pointer holds a (some-kind-of-) memory address, or at least a "data storage" address (which is a synonym for some kind of linear memory). – cubuspl42 Oct 13 '20 at 17:09
64

I'm not sure about your source, but the type of language you're describing comes from the C standard:

6.5.3.2 Address and indirection operators
[...]
3. The unary & operator yields the address of its operand. [...]

So... yeah, pointers point to memory addresses. At least that's how the C standard suggests it to mean.

To say it a bit more clearly, a pointer is a variable holding the value of some address. The address of an object (which may be stored in a pointer) is returned with the unary & operator.

I can store the address "42 Wallaby Way, Sydney" in a variable (and that variable would be a "pointer" of sorts, but since that's not a memory address it's not something we'd properly call a "pointer"). Your computer has addresses for its buckets of memory. Pointers store the value of an address (i.e. a pointer stores the value "42 Wallaby Way, Sydney", which is an address).

Edit: I want to expand on Alexey Frunze's comment.

What exactly is a pointer? Let's look at the C standard:

6.2.5 Types
[...]
20. [...]
A pointer type may be derived from a function type or an object type, called the referenced type. A pointer type describes an object whose value provides a reference to an entity of the referenced type. A pointer type derived from the referenced type T is sometimes called ‘‘pointer to T’’. The construction of a pointer type from a referenced type is called ‘‘pointer type derivation’’. A pointer type is a complete object type.

Essentially, pointers store a value that provides a reference to some object or function. Kind of. Pointers are intended to store a value that provides a reference to some object or function, but that's not always the case:

6.3.2.3 Pointers
[...]
5. An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

The above quote says that we can turn an integer into a pointer. If we do that (that is, if we stuff an integer value into a pointer instead of a specific reference to an object or function), then the pointer "might not point to an entity of reference type" (i.e. it may not provide a reference to an object or function). It might provide us with something else. And this is one place where you might stick some kind of handle or ID in a pointer (i.e. the pointer isn't pointing to an object; it's storing a value that represents something, but that value may not be an address).

So yes, as Alexey Frunze says, it's possible a pointer isn't storing an address to an object or function. It's possible a pointer is instead storing some kind of "handle" or ID, and you can do this by assigning some arbitrary integer value to a pointer. What this handle or ID represents depends on the system/environment/context. So long as your system/implementation can make sense of the value, you're in good shape (but that depends on the specific value and the specific system/implemenation).

Normally, a pointer stores an address to an object or function. If it isn't storing an actual address (to an object or function), the result is implementation defined (meaning that exactly what happens and what the pointer now represents depends on your system and implementation, so it might be a handle or ID on a particular system, but using the same code/value on another system might crash your program).

That ended up being longer than I thought it would be...

Community
  • 1
  • 1
Cornstalks
  • 37,137
  • 18
  • 79
  • 144
  • 4
    In a C interpreter, a pointer may hold a non-address ID/handle/etc. – Alexey Frunze Mar 01 '13 at 06:03
  • C interpreter is not C language described by standard. It's "interpreted-C" which is different animal, and non-standartized as much as I know. – exebook Mar 01 '13 at 06:30
  • 1
    @AlexeyFrunze: I added a (lengthy) expansion on your comment. Feel free to critique. – Cornstalks Mar 01 '13 at 06:38
  • 5
    @exebook The standard is not anyhow limited to compiled C. – Alexey Frunze Mar 01 '13 at 06:46
  • Is it meaningful though, to pretend that a pointer can store some sort of "ID"? Because in the cold harsh reality outside ISO 9899, no such implementations exist. Pointers are implemented as a variable containing an address, in raw integer format, on 100% of the existing computer systems in the real world. There are some cases where you should just ignore the standard, because it is needlessly generic. – Lundin Mar 01 '13 at 07:55
  • 8
    @Lundin Bravo! Let's ignore the standard more! As if we haven't ignored it enough already and haven't produced buggy and poorly portable software because of it. Also, please not that the original question is generic and as such needs a generic answer. – Alexey Frunze Mar 01 '13 at 08:09
  • @AlexeyFrunze I'm actually a firm believer of standard compliance and preach it whenever I get a chance. But you can't blindly accept everything in the standard, you need to use rational thinking and question if it applies to the real world. I have vast experience of reading and interpreting boring technical standards and the same applies to every single one of them. If you can conclude that the standard says something, but it doesn't make sense in the real world, you address this in documentation and ignore the standard. Any 3rd party notified body reviewing your product will accept such. – Lundin Mar 01 '13 at 09:21
  • 2
    @Lundin, what do you think about the notes about the Renesas D10V [here](http://sourceware.org/gdb/onlinedocs/gdbint/Pointers-and-Addresses.html)? Are you arguing that these are addresses in raw integer format or that the D10V doesn't exist? – Samuel Edwin Ward Mar 01 '13 at 15:04
  • 3
    When others are saying that a pointer might be a handle or something else other than an address, they do not just mean that you can coerce data into a pointer by casting an integer into a pointer. They mean the compiler might be using something other than memory addresses to implement pointers. On the Alpha processor with DEC’s ABI, a function pointer was not the address of the function but was the address of a descriptor of a function, and the descriptor contained the address of the function and some data about the function parameters. The point is that the C standard is very flexible. – Eric Postpischil Mar 01 '13 at 15:17
  • @SamuelEdwinWard There are plenty of odd cases where a program cannot address physical memory directly, for various reasons, but uses some virtual addressing scheme instead. See my latest comment below Alexey's answer. – Lundin Mar 01 '13 at 15:17
  • 5
    @Lundin: The assertion that pointers are implemented as integer addresses on 100% of the existing computer systems in the real world is false. Computers exist with word addressing and segment-offset addressing. Compilers still exist with support for near and far pointers. PDP-11 computers exist, with RSX-11 and the Task Builder and its overlays, in which a pointer must identify the information needed to load a function from disk. A pointer cannot have the memory address of an object if the object is not in memory! – Eric Postpischil Mar 01 '13 at 15:32
  • @Samuel It seems they are still integers, and they still point to memory addresses. Just some point to byte address and some point to word address. I'm not sure about Lundin's claim, but this isn't a counterexample. I once worked with a microprocessor that had completely separate data and code memory, each starting at 0x0000. you could not interchange the pointers, but they were still integers. D10V appears to be the same way... you can not interchange the pointers, but they are still integers. – Mr.Mindor Mar 01 '13 at 15:32
  • 2
    @Mr.Mindor floating point numbers are integers too if you feel like it. But to say I can have two pointers to the same piece of memory, and they are both just integers, and the values of those integers aren't the same... what's the point of even saying they're integers at that point? – Samuel Edwin Ward Mar 01 '13 at 16:22
  • @SamuelEdwinWard Because they still are. I'm not claiming that the fact they are integers is useful in the way you seem to want it to be, just that your example of a system where the integer value is interpreted differently depending on the pointer type doesn't refute Lundin's claim that pointers are integers which point to an address in all implementations (which, btw I'm not espousing). – Mr.Mindor Mar 05 '13 at 16:41
  • @Mr.Mindor Technically not, IMO, because the content of a memory location (or a memory word), is not in itself an integer, it is a sequence of bytes, whose interpretation as integers (or pointers, or FP numbers) is not absolute, but depends on the operations you perform on them. Take a memory-mapped 8-bit register of some digital I/O card, for example, and suppose the single bits represent the state of some external lines. Is it meaningful to call the content of that memory location an integer? Every single bit is completely independent, so how can they be interpreted as a number? – LorenzoDonati4Ukraine-OnStrike Sep 22 '13 at 10:42
  • @AlexeyFrunze it seems hard to argue with the Standard quote *The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’*. This implies that the value stored in a pointer is the address of the thing being pointed to. There cannot be a "non-address" stored in a pointer. The C interpreter you refer to would represent its addresses in textual form or whatever. – M.M Mar 02 '15 at 22:30
37

Pointer vs Variable

In this picture,

pointer_p is a pointer which is located at 0x12345, and is pointing to a variable variable_v at 0x34567.

Harikrishnan
  • 3,664
  • 7
  • 48
  • 77
  • 20
    Not only does this not address the notion of address as opposed to pointer, but it integrally misses the point that an address is not just an integer. – Gilles 'SO- stop being evil' Mar 01 '13 at 22:08
  • 21
    -1, this just explains what a pointer is. That was not the question-- and you're pushing aside all the complexities that the question _is_ about. – alexis Mar 01 '13 at 22:54
37

To think of a pointer as an address is an approximation. Like all approximations, it's good enough to be useful sometimes, but it's also not exact which means that relying on it causes trouble.

A pointer is like an address in that it indicates where to find an object. One immediate limitation of this analogy is that not all pointers actually contain an address. NULL is a pointer which is not an address. The content of a pointer variable can in fact be of one of three kinds:

  • the address of an object, which can be dereferenced (if p contains the address of x then the expression *p has the same value as x);
  • a null pointer, of which NULL is an example;
  • invalid content, which doesn't point to an object (if p doesn't hold a valid value, then *p could do anything (“undefined behavior”), with crashing the program a fairly common possibility).

Furthermore, it would be more accurate to say that a pointer (if valid and non-null) contains an address: a pointer indicates where to find an object, but there is more information tied to it.

In particular, a pointer has a type. On most platforms, the type of the pointer has no influence at runtime, but it has an influence that goes beyond the type at compile time. If p is a pointer to int (int *p;), then p + 1 points to an integer which is sizeof(int) bytes after p (assuming p + 1 is still a valid pointer). If q is a pointer to char that points to the same address as p (char *q = p;), then q + 1 is not the same address as p + 1. If you think of pointer as addresses, it is not very intuitive that the “next address” is different for different pointers to the same location.

It is possible in some environments to have multiple pointer values with different representations (different bit patterns in memory) that point to the same location in memory. You can think of these as different pointers holding the same address, or as different addresses for the same location — the metaphor isn't clear in this case. The == operator always tells you whether the two operands are pointing to the same location, so on these environments you can have p == q even though p and q have different bit patterns.

There are even environments where pointers carry other information beyond the address, such as type or permission information. You can easily go through your life as a programmer without encountering these.

There are environments where different kinds of pointers have different representations. You can think of it as different kinds of addresses having different representations. For example, some architectures have byte pointers and word pointers, or object pointers and function pointers.

All in all, thinking of pointers as addresses isn't too bad as long as you keep in mind that

  • it's only valid, non-null pointers that are addresses;
  • you can have multiple addresses for the same location;
  • you can't do arithmetic on addresses, and there's no order on them;
  • the pointer also carries type information.

Going the other way round is far more troublesome. Not everything that looks like an address can be a pointer. Somewhere deep down any pointer is represented as a bit pattern that can be read as an integer, and you can say that this integer is an address. But going the other way, not every integer is a pointer.

There are first some well-known limitations; for example, an integer that designates a location outside your program's address space can't be a valid pointer. A misaligned address doesn't make a valid pointer for a data type that requires alignment; for example, on a platform where int requires 4-byte alignment, 0x7654321 cannot be a valid int* value.

However, it goes well beyond that, because when you make a pointer into an integer, you're in for a world of trouble. A big part of this trouble is that optimizing compilers are far better at microoptimization than most programmers expect, so that their mental model of how a program works is deeply wrong. Just because you have pointers with the same address doesn't mean that they are equivalent. For example, consider the following snippet:

unsigned int x = 0;
unsigned short *p = (unsigned short*)&x;
p[0] = 1;
printf("%u = %u\n", x, *p);

You might expect that on a run-of-the-mill machine where sizeof(int)==4 and sizeof(short)==2, this either prints 1 = 1? (little-endian) or 65536 = 1? (big-endian). But on my 64-bit Linux PC with GCC 4.4:

$ c99 -O2 -Wall a.c && ./a.out 
a.c: In function ‘main’:
a.c:6: warning: dereferencing pointer ‘p’ does break strict-aliasing rules
a.c:5: note: initialized from here
0 = 1?

GCC is kind enough to warn us what's going wrong in this simple example — in more complex examples, the compiler might not notice. Since p has a different type from &x, changing what p points to cannot affect what &x points to (outside some well-defined exceptions). Therefore the compiler is at liberty to keep the value of x in a register and not update this register as *p changes. The program dereferences two pointers to the same address and obtains two different values!

The moral of this example is that thinking of a (non-null valid) pointer as an address is fine, as long as you stay within the precise rules of the C language. The flip side of the coin is that the rules of the C language are intricate, and difficult to get an intuitive feeling for unless you know what happens under the hood. And what happens under the hood is that the tie between pointers and addresses is somewhat loose, both to support “exotic” processor architectures and to support optimizing compilers.

So think of pointers being addresses as a first step in your understanding, but don't follow that intuition too far.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
  • 5
    +1. Other answers seem to miss that a pointer comes with type information. This is far more important than the address/ID/whatever discussion. – undur_gongor Mar 01 '13 at 21:09
  • +1 Excellent points about type information. I'm not sure the compiler examples are correct tho... It seems very unlikely, for example, that `*p = 3` is guaranteed to succeed when p has not been initialized. – LarsH Mar 01 '13 at 21:21
  • @LarsH You're right, thanks, how did I write that? I replaced it by an example that even demonstrates the surprising behavior on my PC. – Gilles 'SO- stop being evil' Mar 01 '13 at 21:57
  • 1
    um, NULL is ((void *)0) .. ? – Aniket Inge Mar 02 '13 at 03:56
  • In many implementations, NULL is not a null pointer. It isn't even a pointer. #define NULL (1312-1200-112ull) would be a perfectly fine definition for NULL. NULL is required to be a macro that evaluates to a _null pointer constant_ and in certain situations, null pointer constants are automatically converted to null pointers. – gnasher729 Apr 17 '14 at 09:42
  • 1
    @gnasher729 The null pointer *is* a pointer. `NULL` isn't, but for the level of detail required here, this is an irrelevant distraction. Even for day-to-day programming, the fact that `NULL` may be implemented as something that doesn't say “pointer” doesn't come up often (primarily passing `NULL` to a variadic function — but even there, if you aren't casting it, you're already making the assumption that all pointer types have the same representation). – Gilles 'SO- stop being evil' Apr 17 '14 at 10:06
  • This answers the question. A pointer is not an address; it is a language abstraction of an address. – jds Jan 30 '15 at 03:15
  • NULL compares equal to zero. That is different from (void*)0. – Jeff Hammond Apr 20 '15 at 02:27
19

A pointer is a variable that HOLDS memory address, not the address itself. However, you can dereference a pointer - and get access to the memory location.

For example:

int q = 10; /*say q is at address 0x10203040*/
int *p = &q; /*means let p contain the address of q, which is 0x10203040*/
*p = 20; /*set whatever is at the address pointed by "p" as 20*/

That's it. It's that simple.

enter image description here

A program to demonstrate what I am saying and its output is here:

http://ideone.com/rcSUsb

The program:

#include <stdio.h>

int main(int argc, char *argv[])
{
  /* POINTER AS AN ADDRESS */
  int q = 10;
  int *p = &q;

  printf("address of q is %p\n", (void *)&q);
  printf("p contains %p\n", (void *)p);

  p = NULL;
  printf("NULL p now contains %p\n", (void *)p);
  return 0;
}
Aniket Inge
  • 25,375
  • 5
  • 50
  • 78
  • 5
    It can confuse even more. Alice, can you see a cat? No I can see only a smile of a cat. So saying that pointer is an address or pointer is a variable that holds an address or saying that pointer is a name of a concept that refers to the idea of an address, how far book writers can go in confusing neeeewbies? – exebook Mar 01 '13 at 05:56
  • @exebook to those seasoned in pointers, it is quite simple. Maybe a picture will help? – Aniket Inge Mar 01 '13 at 05:57
  • 5
    A pointer does not necessarily hold an address. In a C interpreter, it could be something else, some kind of ID/handle. – Alexey Frunze Mar 01 '13 at 06:06
  • The "label" or variable name is a compiler/assembler and doesn't exist at the machine level so I don't think it should appear in the memory. – Ben Mar 01 '13 at 15:49
  • "A pointer is a variable that HOLDS memory address, not the address itself" - really? http://linux.die.net/man/3/fopen says that `fopen()` returns a FILE pointer. Does `fopen()` return a variable? – LarsH Mar 01 '13 at 21:14
  • No, it's not that simple. For a start, a null pointer doesn't hold an address. More importantly, you can have two pointers that hold the same address, but behave differently (my answer goes into more detail). – Gilles 'SO- stop being evil' Mar 01 '13 at 22:04
  • @LarsH is pointer an **address** or a **variable that holds address**? With `fopen()` lets say an address was returned from it, where do you put the address? can you use the return value from `fopen()` without putting the "address" it returned into a `FILE *` object? – Aniket Inge Mar 03 '13 at 03:18
  • 1
    @Aniket A pointer variable can contain a pointer value. You only need to store the result of `fopen` into a variable if you need to use it more than once (which, for `fopen`, is pretty much all the time). – Gilles 'SO- stop being evil' Mar 04 '13 at 01:08
  • @Aniket: a pointer value is returned. You can put it in a pointer variable if you want, or you could pass it directly to `fclose()`. You normally would put it in a variable, but you don't have to. – LarsH Mar 04 '13 at 03:54
16

It's difficult to tell exactly what the authors of those books mean exactly. Whether a pointer contains an address or not depends on how you define an address and how you define a pointer.

Judging from all the answers that are written, some people assume that (1) an address must be an integer and (2) a pointer doesn't need to be by virtual of not being said so in the specification. With these assumptions, then clearly pointers do not necessarily contain addresses.

However, we see that while (2) is probably true, (1) probably doesn't have to be true. And what to make of the fact that the & is called the address of operator as per @CornStalks's answer? Does this mean that the authors of the specification intend for a pointer to contain an address?

So can we say, pointer contains an address, but an address doesn't have to be an integer? Maybe.

I think all of this is jibberish pedantic semantic talk. It is totally worthless practically speaking. Can you think of a compiler that generates code in such a way that the value of a pointer is not an address? If so, what? That's what I thought...

I think what the author of the book (the first excerpt that claims that pointers are not necessarily just addresses) probably is referring to is the fact that a pointer comes with it the inherent type information.

For example,

 int x;
 int* y = &x;
 char* z = &x;

both y and z are pointers, but y+1 and z+1 are different. if they are memory addresses, wouldn't those expressions give you the same value?

And here in lies the thinking about pointers as if they were addresses usually leads to grief. Bugs have been written because people think about pointers as if they were addresses, and this usually leads to grief.

55555 is probably not a pointer, although it may be an address, but (int*)55555 is a pointer. 55555+1 = 55556, but (int*)55555+1 is 55559 (+/- difference in terms of sizeof(int)).

thang
  • 3,466
  • 1
  • 19
  • 31
  • 1
    +1 for pointing out pointer arithmetic is not the same as arithmetic on addresses. – kutschkem Mar 01 '13 at 13:13
  • In the case of the 16-bit 8086, a memory address is described by a segment base + offset, both 16 bits. There are many combinations of segment base + offset that give the same address in memory. This `far` pointer isn't just "an integer". – vonbrand Mar 01 '13 at 16:39
  • @vonbrand i don't understand why you posted that comment. that issue has been discussed as comments under other answers. just about every other answer assumes that address = integer and anything not integer is not address. i simply point this out and note that it may or may not be correct. my whole point in the answer is that it is not relevant. it's all just pedantic, and the main issue is not being addressed in the other answers. – thang Mar 06 '13 at 22:21
  • @tang, the idea "pointer == address" is **wrong**. That everybody and their favorite aunt continue saying so doesn't make it right. – vonbrand Mar 06 '13 at 22:28
  • @vonbrand, and why did you make that comment under my post? I didn't say it is either right or wrong. In fact, it is right in certain scenarios/assumptions, but not always. Let me summarize again the point of the post (for the second time). **my whole point in the answer is that it is not relevant. it's all just pedantic, and the main issue is not being addressed in the other answers.** it would be more appropriate to comment on the answers that do make the claim that pointer==address or address==integer. see my comments under Alexey's post with respect to segment:offset. – thang Mar 06 '13 at 23:19
  • and by the way, that there's a ton of answers that are mostly wrong, inaccurate, incomplete, and totally miss the point is out of my control. And furthermore, that they get up voted is also out of my control. I have commented on them to clarify the issue before writing this answer. – thang Mar 06 '13 at 23:24
16

Well, a pointer is an abstraction representing a memory location. Note that the quote doesn't say that thinking about pointers as if they were memory addresses is wrong, it just says that it "usually leads to grief". In other words, it leads you to have incorrect expectations.

The most likely source of grief is certainly pointer arithmetic, which is actually one of C's strengths. If a pointer was an address, you'd expect pointer arithmetic to be address arithmetic; but it's not. For example, adding 10 to an address should give you an address that is larger by 10 addressing units; but adding 10 to a pointer increments it by 10 times the size of the kind of object it points to (and not even the actual size, but rounded up to an alignment boundary). With an int * on an ordinary architecture with 32-bit integers, adding 10 to it would increment it by 40 addressing units (bytes). Experienced C programmers are aware of this and put it to all kinds of good uses, but your author is evidently no fan of sloppy metaphors.

There's the additional question of how the contents of the pointer represent the memory location: As many of the answers have explained, an address is not always an int (or long). In some architectures an address is a "segment" plus an offset. A pointer might even contain just the offset into the current segment ("near" pointer), which by itself is not a unique memory address. And the pointer contents might have only an indirect relationship to a memory address as the hardware understands it. But the author of the quote cited doesn't even mention representation, so I think it was conceptual equivalence, rather than representation, that they had in mind.

alexis
  • 48,685
  • 16
  • 101
  • 161
12

Here's how I've explained it to some confused people in the past: A pointer has two attributes that affect its behavior. It has a value, which is (in typical environments) a memory address, and a type, which tells you the type and size of the object that it points at.

For example, given:

union {
    int i;
    char c;
} u;

You can have three different pointers all pointing to this same object:

void *v = &u;
int *i = &u.i;
char *c = &u.c;

If you compare the values of these pointers, they're all equal:

v==i && i==c

However, if you increment each pointer, you'll see that the type that they point to becomes relevant.

i++;
c++;
// You can't perform arithmetic on a void pointer, so no v++
i != c

The variables i and c will have different values at this point, because i++ causes i to contain the address of the next-accessible integer, and c++ causes c to point to the next-addressable character. Typically, integers take up more memory than characters, so i will end up with a larger value than c after they are both incremented.

Mark Bessey
  • 19,598
  • 4
  • 47
  • 69
  • 2
    +1 Thank you. With pointers, value and type are as inseparable as one can separate mans body from his soul. – Aki Suihkonen Mar 05 '13 at 07:21
  • `i == c` is ill-formed (you can only compare pointers to different types if there is an implicit conversion from one to the other). Further, fixing this with a cast means you have applied a conversion, and then it's debatable whether the conversion changes the value or not. (You could assert that it doesn't, but then that is just asserting the same thing that you were trying to prove with this example). – M.M Mar 02 '15 at 22:25
9

You are right and sane. Normally, a pointer is just an address, so you can cast it to integer and do any arithmetics.

But sometimes pointers are only a part of an address. On some architectures a pointer is converted to an address with addition of base or another CPU register is used.

But these days, on PC and ARM architecture with a flat memory model and C language natively compiled, it's OK to think that a pointer is an integer address to some place in one-dimensional addressable RAM.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
exebook
  • 32,014
  • 33
  • 141
  • 226
  • PC... flat memory model? what are selectors? – thang Mar 01 '13 at 06:20
  • Riight. And when the next architecture change comes around, perhaps with separate code adn data spaces, or someone goes back to the venerable segment architecture (which makes tons of sense for security, might even add some key to segment number + offset to check permissions) your lovely "pointers are just integers" comes crashing down. – vonbrand Mar 01 '13 at 16:43
8

Mark Bessey already said it, but this needs to be re-emphasised until understood.

Pointer has as much to do with a variable than a literal 3.

Pointer is a tuple of a value (of an address) and a type (with additional properties, such as read only). The type (and the additional parameters if any) can further define or restrict the context; eg. __far ptr, __near ptr : what is the context of the address: stack, heap, linear address, offset from somewhere, physical memory or what.

It's the property of type that makes pointer arithmetic a bit different to integer arithmetic.

The counter examples of a pointer of not being a variable are too many to ignore

  • fopen returning a FILE pointer. (where's the variable)

  • stack pointer or frame pointer being typically unaddressable registers

    *(int *)0x1231330 = 13; -- casting an arbitrary integer value to a pointer_of_integer type and writing/reading an integer without ever introducing a variable

In the lifetime of a C-program there will be many other instances of temporary pointers that do not have addresses -- and therefore they are not variables, but expressions/values with a compile time associated type.

Community
  • 1
  • 1
Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
7

A pointer, like any other variable in C, is fundamentally a collection of bits which may be represented by one or more concatenated unsigned char values (as with any other type of cariable, sizeof(some_variable) will indicate the number of unsigned char values). What makes a pointer different from other variables is that a C compiler will interpret the bits in a pointer as identifying, somehow, a place where a variable may be stored. In C, unlike some other languages, it is possible to request space for multiple variables, and then convert a pointer to any value in that set into a pointer to any other variable within that set.

Many compilers implement pointers by using their bits store actual machine addresses, but that is not the only possible implementation. An implementation could keep one array--not accessible to user code--listing the hardware address and allocated size of all of the memory objects (sets of variables) which a program was using, and have each pointer contain an index into an array along with an offset from that index. Such a design would allow a system to not only restrict code to only operating upon memory that it owned, but also ensure that a pointer to one memory item could not be accidentally converted into a pointer to another memory item (in a system that uses hardware addresses, if foo and bar are arrays of 10 items that are stored consecutively in memory, a pointer to the "eleventh" item of foo might instead point to the first item of bar, but in a system where each "pointer" is an object ID and an offset, the system could trap if code tried to index a pointer to foo beyond its allocated range). It would also be possible for such a system to eliminate memory-fragmentation problems, since the physical addresses associated with any pointers could be moved around.

Note that while pointers are somewhat abstract, they're not quite abstract enough to allow a fully-standards-compliant C compiler to implement a garbage collector. The C compiler specifies that every variable, including pointers, is represented as a sequence of unsigned char values. Given any variable, one can decompose it into a sequence of numbers and later convert that sequence of numbers back into a variable of the original type. Consequently, it would be possible for a program to calloc some storage (receiving a pointer to it), store something there, decompose the pointer into a series of bytes, display those on the screen, and then erase all reference to them. If the program then accepted some numbers from the keyboard, reconstituted those to a pointer, and then tried to read data from that pointer, and if user entered the same numbers that the program had earlier displayed, the program would be required to output the data that had been stored in the calloc'ed memory. Since there is no conceivable way the computer could know whether the user had made a copy of the numbers that were displayed, there would be no conceivable may the computer could know whether the aforementioned memory might ever be accessed in future.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • At great overhead, maybe you could detect any use of the pointer value that might "leak" its numeric value, and pin the allocation so that the garbage collector won't collect or relocate it (unless `free` is called explicitly, of course). Whether the resulting implementation would be all that useful is another matter, since its ability to collect might be too limited, but you could at least call it a garbage collector :-) Pointer assignment and arithmetic wouldn't "leak" the value, but any access to a `char*` of unknown origin would have to be checked. – Steve Jessop Jan 30 '15 at 13:39
  • @SteveJessop: I think such a design would be worse than useless, since it would be impossible for code to know what pointers needed to be freed. Garbage-collectors that assume anything that looks like a pointer is one may be overly conservative, but generally things that look like--but aren't--pointers have a possibility of changing, thus avoiding "permanent" memory leaks. Having any action that looks like it's decomposing a pointer to bytes permanently freeze the pointer is a guaranteed recipe for memory leaks. – supercat Jan 30 '15 at 13:56
  • I think it would fail anyway for performance reasons -- if you want your code to run that slowly because every access is checked then don't write it in C ;-) I have higher hopes for the ingenuity of C programmers than you do, since I think while inconvenient it's probably not implausible to avoid pinning allocations unnecessarily. Anyway, C++ defines "safely derived pointers" precisely in order to deal with this issue, so we know what to do if we ever want to increase the abstractness of C pointers to the level where they support reasonably effective garbage collection. – Steve Jessop Jan 30 '15 at 14:16
  • @SteveJessop: For a GC system to be useful, it should either be able to reliably release memory upon which `free` has not been called, or prevent any reference to a freed object from becoming a reference to a live object [even when using resources that require explicit lifetime management, GC can still usefully perform the latter function]; a GC system which sometimes falsely regard objects as having live references to them may be usable *if the probability of N objects being needlessly pinned simultaneously approaches zero as N gets large*. Unless one is willing to flag a compiler error... – supercat Jan 30 '15 at 16:33
  • ...for code which is valid C++, but for which the compiler would be unable to prove that a pointer can never get converted into an unrecognizable form, I don't see how one could avoid the risk that a program which in fact never uses pointers as integers might be falsely regarded as doing so. – supercat Jan 30 '15 at 16:37
  • For a GC system to be useful it has to be able to collect a subset of objects which the programmer ensures are collectible, by treating them carefully. For it to be *extremely* useful, in the way that the Java GC is useful, it does that even for carelessly-treated objects ;-) There are many programs in which it would be simple to ensure that for some entity type, its objects' addresses are never "leaked", but difficult (or impossible) for the compiler to know that this is what the programmer has ensured. – Steve Jessop Feb 02 '15 at 15:31
  • @SteveJessop: It might be possible in C++ to have a form of GC in which every GC object encapsulated some primitives and some GC references, and in which all GC references not held within GC objects had to be held in lifetime-managed objects [every "outside-reference" object would be considered a GC root; failing to call the destructor of an "outside" GC reference would leak that object and all objects directly or indirectly referenced thereby]. The thing I don't know is how well one could avoid having a statement like `foo->ref1 = bar->ref2` [where `foo` and `bar` are both `OutsideGcRef`]... – supercat Feb 02 '15 at 16:44
  • ...need to construct an `OutsideGcRef` to hold the reference temporarily. Ideally, the type of `bar->ref2` would not be constructable except by the GC, but would be convertible to an `OutsideGcRef`, and would override the assignment operator so as to copy the reference directly without having to create an `OutsideGcRef` first. – supercat Feb 02 '15 at 16:47
6

A pointer is a variable type that is natively available in C/C++ and contains a memory address. Like any other variable it has an address of its own and takes up memory (the amount is platform specific).

One problem you will see as a result of the confusion is trying to change the referent within a function by simply passing the pointer by value. This will make a copy of the pointer at function scope and any changes to where this new pointer "points" will not change the referent of the pointer at the scope that invoked the function. In order to modify the actual pointer within a function one would normally pass a pointer to a pointer.

Matthew Sanders
  • 4,875
  • 26
  • 45
  • 1
    Generally, it's a handle/ID. Usually, it's a plain address. – Alexey Frunze Mar 01 '13 at 06:08
  • I adjusted my answer to be a bit more PC to the definition of [Handle](http://en.wikipedia.org/wiki/Handle_(computing)) in wikipedia. I like to refer to pointers as a particular instance of a handle, as a handle may simply be a reference to a pointer. – Matthew Sanders Mar 01 '13 at 06:16
6

BRIEF SUMMARY (which I will also put at the top):

(0) Thinking of pointers as addresses is often a good learning tool, and is often the actual implementation for pointers to ordinary data types.

(1) But on many, perhaps most, compilers pointers to functions are not addresses, but are bigger than an address (typically 2x, sometimes more), or are actually pointers to a struct in memory than contains the addresses of function and stuff like a constant pool.

(2) Pointers to data members and pointers to methods are often even stranger.

(3) Legacy x86 code with FAR and NEAR pointer issues

(4) Several examples, most notably the IBM AS/400, with secure "fat pointers".

I am sure you can find more.

DETAIL:

UMMPPHHH!!!!! Many of the answers so far are fairly typical "programmer weenie" answers - but not compiler weenie or hardware weenie. Since I pretend to be a hardware weenie, and often work with compiler weenies, let me throw in my two cents:

On many, probably most, C compilers, a pointer to data of type T is, in fact, the address of T.

Fine.

But, even on many of these compilers, certain pointers are NOT addresses. You can tell this by looking at sizeof(ThePointer).

For example, pointers to functions are sometimes quite a lot bigger than ordinary addresses. Or, they may involve a level of indirection. This article provides one description, involving the Intel Itanium processor, but I have seen others. Typically, to call a function you must know not only the address of the function code, but also the address of the function's constant pool - a region of memory from which constants are loaded with a single load instruction, rather than the compiler having to generate a 64 bit constant out of several Load Immediate and Shift and OR instructions. So, rather than a single 64 bit address, you need 2 64 bit addresses. Some ABIs (Application Binary Interfaces) move this around as 128 bits, whereas others use a level of indirection, with the function pointer actually being the address of a function descriptor that contains the 2 actual addresses just mentioned. Which is better? Depends on your point of view: performance, code size, and some compatibility issues - often code assumes that a pointer can be cast to a long or a long long, but may also assume that the long long is exactly 64 bits. Such code may not be standards compliant, but nevertheless customers may want it to work.

Many of us have painful memories of the old Intel x86 segmented architecture, with NEAR POINTERs and FAR POINTERS. Thankfully these are nearly extinct by now, so only a quick summary: in 16 bit real mode, the actual linear address was

LinearAddress = SegmentRegister[SegNum].base << 4 + Offset

Whereas in protected mode, it might be

LinearAddress = SegmentRegister[SegNum].base + offset

with the resulting address being checked against a limit set in the segment. Some programs used not really standard C/C++ FAR and NEAR pointer declarations, but many just said *T --- but there were compiler and linker switches so, for example, code pointers might be near pointers, just a 32 bit offset against whatever is in the CS (Code Segment) register, while the data pointers might be FAR pointers, specifying both a 16 bit segment number and a 32 bit offset for a 48 bit value. Now, both of these quantities are certainly related to the address, but since they aren't the same size, which of them is the address? Moreover, the segments also carried permissions - read-only, read-write, executable - in addition to stuff related to the actual address.

A more interesting example, IMHO, is (or, perhaps, was) the IBM AS/400 family. This computer was one of the first to implement an OS in C++. Pointers on this machime were typically 2X the actual address size - e.g. as this presentation says, 128 bit pointers, but the actual addresses were 48-64 bits, and, again, some extra info, what is called a capability, that provided permissions such as read, write, as well as a limit to prevent buffer overflow. Yes: you can do this compatibly with C/C++ -- and if this were ubiquitous, the Chinese PLA and slavic mafia would not be hacking into so many Western computer systems. But historically most C/C++ programming has neglected security for performance. Most interestingly, the AS400 family allowed the operating system to create secure pointers, that could be given to unprivileged code, but which the unprivileged code could not forge or tamper with. Again, security, and while standards compliant, much sloppy non-standards compliant C/C++ code will not work in such a secure system. Again, there are official standards, and there are de-facto standards.

Now, I'll get off my security soapbox, and mention some other ways in which pointers (of various types) are often not really addresses: Pointers to data members, pointers to member functions methods, and the static versions thereof are bigger than an ordinary address. As this post says:

There are many ways of solving this [problems related to single versus multiple inheitance, and virtual inheritance]. Here's how the Visual Studio compiler decides to handle it: A pointer to a member function of a multiply-inherited class is really a structure." And they go on to say "Casting a function pointer can change its size!".

As you can probably guess from my pontificating on (in)security, I've been involved in C/C++ hardware/software projects where a pointer was treated more like a capability than a raw address.

I could go on, but I hope you get the idea.

BRIEF SUMMARY (which I will also put at the top):

(0) thinking of pointers as addresses is often a good learning tool, and is often the actual implementation for pointers to ordinary data types.

(1) But on many, perhaps most, compilers pointers to functions are not addresses, but are bigger than an address (typically 2X, sometimes more), or are actually pointers to a struct in memory than contains the addresses of function and stuff like a constant pool.

(2) Pointers to data members and pointers to methods are often even stranger.

(3) Legacy x86 code with FAR and NEAR pointer issues

(4) Several examples, most notably the IBM AS/400, with secure "fat pointers".

I am sure you can find more.

Mifeet
  • 12,949
  • 5
  • 60
  • 108
Krazy Glew
  • 7,210
  • 2
  • 49
  • 62
  • In 16 bit real mode `LinearAddress = SegmentRegister.Selector * 16 + Offset` (note times 16, not shift by 16). In protected mode `LinearAddress = SegmentRegister.base + offset` (no multiplication of any kind; the segment base is stored in the GDT/LDT and cached in the segment register *as is*). – Alexey Frunze Mar 28 '13 at 10:48
  • You are also correct about the segment base. I had misremembered. It is the segment limit that is optionally multiple by 4K. he segment base just needs to be unscrambled by the hardware when it loads a segment descriptor from memory into a segment register. – Krazy Glew Mar 29 '13 at 20:34
4

A pointer is just another variable which is used to hold the address of a memory location (usually the memory address of another variable).

Tuxdude
  • 47,485
  • 15
  • 109
  • 110
  • So, the pointee is actually a memory address? You disagree with the author? Just trying to understand. – d0rmLife Mar 01 '13 at 05:52
  • The primary function of the pointer is to point to something. How exactly that is achieved and whether there is a real address or not, is not defined. A pointer could be just an ID/handle, not a real address. – Alexey Frunze Mar 01 '13 at 06:05
3

Before understanding pointers we need to understand objects. Objects are entities which exist and has a location specifier called an address. A pointer is just a variable like any other variables in C with a type called pointer whose content is interpreted as the address of an object which supports the following operation.

+ : A variable of type integer (usually called offset) can be added to yield a new pointer
- : A variable of type integer (usually called offset) can be subtracted to yield a new pointer
  : A variable of type pointer can be subtracted to yield an integer (usually called offset)
* : De-referencing. Retrieve the value of the variable (called address) and map to the object the address refers to.
++: It's just `+= 1`
--: It's just `-= 1`

A pointer is classified based on the type of object it is currently referring. The only part of the information it matters is the size of the object.

Any object supports an operation, & (address of), which retrieves the location specifier (address) of the object as a pointer object type. This should abate the confusion surrounding the nomenclature as this would make sense to call & as an operation of an object rather than a pointer whose resultant type is a pointer of the object type.

Note Throughout this explanation, I have left out the concept of memory.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • I like your explanation on the abstract reality of a general pointer in a general system. But, perhaps discussing memory would be helpful. In fact, speaking for myself, I know it would...! I think discussing the connection can be very helpful for understanding the big picture. +1 anyways :) – d0rmLife Mar 01 '13 at 20:44
  • @d0rmLife: You have enough explanation in the other answers which covers the bigger picture. I just wanted to give a mathematical abstract explanation as an another view. Also IMHO, it would create less confusion in calling `&` as 'Address of` as that is more tied to an Object rather than the pointer per se` – Abhijit Mar 01 '13 at 20:48
  • No offense, but I will decide for myself what sufficient explanation is. One **textbook** is not sufficient to fully explain data structures and memory allocation. ;) .... anyways, your answer is **still helpful**, *even if it isn't novel.* – d0rmLife Mar 01 '13 at 20:51
  • It makes no sense to handle _pointers_ without the concept of _memory_. If the object exists without memory, it must be in a place, where there is no address -- e.g. in registers. To be able to use '&' presupposes memory. – Aki Suihkonen Mar 05 '13 at 07:15
3

A C pointer is very similar to a memory address but with machine-dependent details abstracted away, as well as some features not found in the lower level instruction set.

For example, a C pointer is relatively richly typed. If you increment a pointer through an array of structures, it nicely jumps from one structure to the other.

Pointers are subject to conversion rules and provide compile time type checking.

There is a special "null pointer" value which is portable at the source code level, but whose representation may differ. If you assign an integer constant whose value is zero to a pointer, that pointer takes on the null pointer value. Ditto if you initialize a pointer that way.

A pointer can be used as a boolean variable: it tests true if it is other than null, and false if it is null.

In a machine language, if the null pointer is a funny address like 0xFFFFFFFF, then you may have to have explicit tests for that value. C hides that from you. Even if the null pointer is 0xFFFFFFFF, you can test it using if (ptr != 0) { /* not null! */}.

Uses of pointers which subvert the type system lead to undefined behavior, whereas similar code in machine language might be well defined. Assemblers will assemble the instructions you have written, but C compilers will optimize based on the assumption that you haven't done anything wrong. If a float *p pointer points to a long n variable, and *p = 0.0 is executed, the compiler is not required to handle this. A subsequent use of n will not necessary read the bit pattern of the float value, but perhaps, it will be an optimized access which is based on the "strict aliasing" assumption that n has not been touched! That is, the assumption that the program is well-behaved, and so p should not be pointing at n.

In C, pointers to code and pointers to data are different, but on many architectures, the addresses are the same. C compilers can be developed which have "fat" pointers, even though the target architecture does not. Fat pointers means that pointers are not just machine addresses, but contain other info, such as information about the size of the object being pointed at, for bounds checking. Portably written programs will easily port to such compilers.

So you can see, there are many semantic differences between machine addresses and C pointers.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • NULL pointers don't work the way you think they do on all platforms - please see my reply to CiscoIPPhone above. NULL == 0 is an assumption that only holds on x86 based platforms. Convention says that new platforms should match x86, however particularly in the embedded world this is not so. Edit: Also, C does nothing to abstract the value of a pointer way from hardware - "ptr != 0" will not work as a NULL test on a platform where NULL != 0. – DX-MON Mar 02 '13 at 12:49
  • 1
    DX-MON, that's completely wrong for standard C. NULL is devined to be 0, and they can be used interchangeably in statements. Whether on not the NULL pointer representation in hardware is all 0 bits is irrelevant to how it's represented in source code. – Mark Bessey Mar 02 '13 at 19:12
  • @DX-MON I'm afraid you are not working with the correct facts. In C, an integral constant expression serves as a null pointer constant, regardless of whether the null pointer is the null address. If you know of a C compiler where `ptr != 0` is not a null test, please reveal its identity (but before you do that, send a bug report to the vendor). – Kaz Mar 02 '13 at 21:07
  • I see what you're getting at, but your comments about null pointers are incoherent because you're **confusing pointers and memory addresses**-- exactly what the quote cited in the question advises avoiding! The correct statement: C defines the null pointer to be zero, regardless of whether a memory address at offset zero is legal or not. – alexis Mar 05 '13 at 13:05
  • "If a float *p pointer points to a long n variable, and *p = 0.0 is executed, the compiler is not required to handle this." What do you mean? Aliasing problems result from (non-omniscient) optimization; they have nothing to do with the type of the pointed-to variable, or with the address-pointer relationship really. – alexis Mar 05 '13 at 13:20
  • 1
    @alexis Chapter and verse, please. C does not define the null pointer to be zero. C defines zero (or any integral constant expression whose value is zero) as a *syntax* for denoting a null pointer constant. http://www.faqs.org/faqs/C-faq/faq/ (section 5). – Kaz Mar 05 '13 at 15:56
  • @alexis Yes, aliasing is connected to type, in the following way. C forbids most instances of type-punned aliasing, and that allows for aggressive optimizations. If an object of type `int` is assigned to, the compiler doesn't have to be concerned that the assignment might modify anything that has type `double` (a situation which can only happen via some kind of aliasing). Of course, aliasing can take place when the type is the same also, e.g. aliased arrays. There are different issues there, but related. – Kaz Mar 05 '13 at 16:01
  • @Kaz, you're right that my formulation was oversimplified: It's not about what the null pointer "is", meaning its internal storage representation. You get a null pointer by setting it to zero, the null pointer constant, and it compares equal to zero, so as far as C is concerned, it _is_ zero-- actually I think your answer makes pretty much the same point, but from a different perspective. (In other words: I'm not saying you're factually wrong, but I'd draw a very different distinction). – alexis Mar 05 '13 at 21:41
  • @alexis A null pointer does not compare equal to *zero*. It compares equal to a null pointer constant, which is only notated by an integral zero at compile time. A null pointer is not equal to an `int` type variable whose value is zero. Such a comparison requires a diagnostic. So no, a null pointer is not zero as far as C is concerned. – Kaz Mar 07 '13 at 08:01
  • @Kaz so in that case, how the hell on an embedded system would you dereference memory address 0? Cast a long set to 0 to a pointer? I think you'll find that if you try any of that sort of funk with the Xilinx version of GCC, NULL will be set to something more sensible for the platform such as 0xFFFFFFFF as that is usually an unused and therefore invalid address in the processor memory address map of a Microblaze. Memory address zero is actually surprisingly important on a Microblaze due to the interrupt vector system, not being able to access it sounds like an immediate "shot in foot" – DX-MON Mar 18 '13 at 09:30
  • @DX-MON Dereferencing memory address 0 is outside of portable C. Typically, you would do it like this: `TYPE *ptr = (TYPE *) 0` and then later `*ptr`. That depends on the null pointer corresponding to address zero. But that dependency is true of your embedded system. The code is portable only to that system anyway, so everything is cool. If `(TYPE *) 0` produced the address `0x7FFFFFFF` or whatever, then you'd be on a different system where you'd have a workaround for that, like using a non-constant zero: `int zero = 0; TYPE *p = (TYPE *) zero`. – Kaz Feb 02 '15 at 04:49
3

An address is used to identify a piece of fixed-size storage, usually for each bytes, as an integer. This is precisely called as byte address, which is also used by the ISO C. There can be some other methods to construct an address, e.g. for each bit. However, only byte address is so often used, we usually omit "byte".

Technically, an address is never a value in C, because the definition of term "value" in (ISO) C is:

precise meaning of the contents of an object when interpreted as having a specific type

(Emphasized by me.) However, there is no such "address type" in C.

Pointer is not the same. Pointer is a kind of type in the C language. There are several distinct pointer types. They does not necessarily obey to identical set of rules of the language, e.g. the effect of ++ on a value of type int* vs. char*.

A value in C can be of a pointer type. This is called a pointer value. To be clear, a pointer value is not a pointer in the C language. But we are accustomed to mix them together, because in C it is not likely to be ambiguous: if we call an expression p as a "pointer", it is merely a pointer value but not a type, since a named type in C is not expressed by an expression, but by a type-name or a typedef-name.

Some other things are subtle. As a C user, firstly, one should know what object means:

region of data storage in the execution environment, the contents of which can represent values

An object is an entity to represent values, which are of a specific type. A pointer is an object type. So if we declare int* p;, then p means "an object of pointer type", or an "pointer object".

Note there is no "variable" normatively defined by the standard (in fact it is never being used as a noun by ISO C in normative text). However, informally, we call an object a variable, as some other language does. (But still not so exactly, e.g. in C++ a variable can be of reference type normatively, which is not an object.) The phrases "pointer object" or "pointer variable" are sometimes treated like "pointer value" as above, with a probable slight difference. (One more set of examples is "array".)

Since pointer is a type, and address is effectively "typeless" in C, a pointer value roughly "contains" an address. And an expression of pointer type can yield an address, e.g.

ISO C11 6.5.2.3

3 The unary & operator yields the address of its operand.

Note this wording is introduced by WG14/N1256, i.e. ISO C99:TC3. In C99 there is

3 The unary & operator returns the address of its operand.

It reflects the committee's opinion: an address is not a pointer value returned by the unary & operator.

Despite the wording above, there are still some mess even in the standards.

ISO C11 6.6

9 An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator

ISO C++11 5.19

3 ... An address constant expression is a prvalue core constant expression of pointer type that evaluates to the address of an object with static storage duration, to the address of a function, or to a null pointer value, or a prvalue core constant expression of type std::nullptr_t. ...

(Recent C++ standard draft uses another wording so there is no this problem.)

Actually both "address constant" in C and "address constant expression" in C++ are constant expression of pointer types (or at least "pointer-like" types since C++11).

And the builtin unary & operator is called as "address-of" in C and C++; similarily, std::addressof is introduced in C++11.

These naming may bring misconception. The resulted expression is of pointer type, so they'd be interpreted as: the result contains/yields an address, rather than is an address.

FrankHB
  • 2,297
  • 23
  • 19
2

A pointer is just another variable that can contain memory address usually of another variable. A pointer being a variable it too has an memory address.

Xavier DSouza
  • 2,861
  • 7
  • 29
  • 40
2

It says "because it confuses those who don't know what addresses are about" - also, it's true: if you learn what addresses are about, you'll be not confused. Theoretically, pointer is a variable which points to another, practically holds an address, which is the address of the variable it points to. I don't know why should hide this fact, it's not a rocket science. If you understand pointers, you'll one step closer to understand how computers work. Go ahead!

ern0
  • 3,074
  • 25
  • 40
2

Come to think about it, I think it's a matter of semantics. I don't think the author is right, since the C standard refers to a pointer as holding an address to the referenced object as others have already mentioned here. However, address!=memory address. An address can be really anything as per C standard although it will eventually lead to a memory address, the pointer itself can be an id, an offset + selector (x86), really anything as long as it can describe (after mapping) any memory address in the addressable space.

Rad'Val
  • 8,895
  • 9
  • 62
  • 92
  • A pointer *holds* an address (or doesn't, if it's null). But that's a far cry from it *being* an address: for example, two pointers to the same address but with a different type are not equivalent in many situations. – Gilles 'SO- stop being evil' Mar 01 '13 at 21:58
  • @Gilles If you see "being", as in `int i=5`-> i *is* 5 then, the pointer is the address yes. Also, null has an address as well. Usually an invalid write address (but not necessarily, see x86-real mode), but an address none the less. There are really only 2 requirements for null: it's guaranteed to compare unequal to a pointer to an actual object and any two null pointers will compare equal. – Rad'Val Mar 01 '13 at 22:14
  • On the contrary, a null pointer is guaranteed not to be equal to the address of any object. Dereferencing a null pointer is undefined behavior. A big problem with saying that “the pointer is the address” is that they work differently. If `p` is a pointer, `p+1` is not always the address incremented by 1. – Gilles 'SO- stop being evil' Mar 01 '13 at 22:19
  • Read again the comment please, `it's guaranteed to compare unequal to a pointer to an actual object`. As for the pointer arithmetics, I don't see the point, the value of the pointer is still an address, even if the "+" operation will not necessarily add one byte to it. – Rad'Val Mar 01 '13 at 22:46
1

One other way in which a C or C++ pointer differs from a simple memory address due to the different pointer types I haven't seen in the other answers (altrhough given their total size, I may have overlooked it). But it is probably the most important one, because even experienced C/C++ programmers can trip over it:

The compiler may assume that pointers of incompatible types do not point to the same address even if they clearly do, which may give behaviour that would no be possible with a simple pointer==address model. Consider the following code (assuming sizeof(int) = 2*sizeof(short)):

unsigned int i = 0;
unsigned short* p = (unsigned short*)&i;
p[0]=p[1]=1;

if (i == 2 + (unsigned short)(-1))
{
  // you'd expect this to execute, but it need not
}

if (i == 0)
{
  // you'd expect this not to execute, but it actually may do so
}

Note that there's an exception for char*, so manipulating values using char* is possible (although not very portable).

celtschk
  • 19,311
  • 3
  • 39
  • 64
0

Quick summary: A C address is a value, typically represented as a machine-level memory address, with a specific type.

The unqualified word "pointer" is ambiguous. C has pointer objects (variables), pointer types, pointer expressions, and pointer values.

It's very common to use the word "pointer" to mean "pointer object", and that can lead to some confusion -- which is why I try to use "pointer" as an adjective rather than as a noun.

The C standard, at least in some cases, uses the word "pointer" to mean "pointer value". For example, the description of malloc says it "returns either a null pointer or a pointer to the allocated space".

So what's an address in C? It's a pointer value, i.e., a value of some particular pointer type. (Except that a null pointer value is not necessarily referred to as an "address", since it isn't the address of anything).

The standard's description of the unary & operator says it "yields the address of its operand". Outside the C standard, the word "address" is commonly used to refer to a (physical or virtual) memory address, typically one word in size (whatever a "word" is on a given system).

A C "address" is typically implemented as a machine address -- just as a C int value is typically implemented as a machine word. But a C address (pointer value) is more than just a machine address. It's a value typically represented as a machine address, and it's a value with some specific type.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
0

A pointer value is an address. A pointer variable is an object that can store an address. This is true because that's what the standard defines a pointer to be. It's important to tell it to C novices because C novices are often unclear on the difference between a pointer and the thing it points to (that is to say, they don't know the difference between an envelope and a building). The notion of an address (every object has an address and that's what a pointer stores) is important because it sorts that out.

However, the standard talks at a particular level of abstraction. Those people the author talks about who "know what addresses are about", but who are new to C, must necessarily have learned about addresses at a different level of abstraction -- perhaps by programming assembly language. There is no guarantee that the C implementation uses the same representation for addresses as the CPUs opcodes use (referred to as "the store address" in this passage), that these people already know about.

He goes on to talk about "perfectly reasonable address manipulation". As far as the C standard is concerned there's basically no such thing as "perfectly reasonable address manipulation". Addition is defined on pointers and that is basically it. Sure, you can convert a pointer to integer, do some bitwise or arithmetic ops, and then convert it back. This is not guaranteed to work by the standard, so before writing that code you'd better know how your particular C implementation represents pointers and performs that conversion. It probably uses the address representation you expect, but it it doesn't that's your fault because you didn't read the manual. That's not confusion, it's incorrect programming procedure ;-)

In short, C uses a more abstract concept of an address than the author does.

The author's concept of an address of course is also not the lowest-level word on the matter. What with virtual memory maps and physical RAM addressing across multiple chips, the number that you tell the CPU is "the store address" you want to access has basically nothing to do with where the data you want is actually located in hardware. It's all layers of indirection and representation, but the author has chosen one to privilege. If you're going to do that when talking about C, choose the C level to privilege!

Personally I don't think the author's remarks are all that helpful, except in the context of introducing C to assembly programmers. It's certainly not helpful to those coming from higher level languages to say that pointer values aren't addresses. It would be far better to acknowledge the complexity than it is to say that the CPU has the monopoly on saying what an address is and thus that C pointer values "are not" addresses. They are addresses, but they may be written in a different language from the addresses he means. Distinguishing the two things in the context of C as "address" and "store address" would be adequate, I think.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
0

Simply to say pointers are actually offset part of the segmentation mechanism which translate to Linear Address after segmentation and then to Physical address after paging. Physical Addresses are actually addressed from you ram.

       Selector  +--------------+         +-----------+
      ---------->|              |         |           |
                 | Segmentation | ------->|  Paging   |
        Offset   |  Mechanism   |         | Mechanism |
      ---------->|              |         |           |
                 +--------------+         +-----------+
        Virtual                   Linear                Physical
router
  • 582
  • 5
  • 16