1

I've recently tried to really come to grips with references and pointers in C++, and I'm getting a little bit confused. I understand the * and & operators which can respectively get the value at an address and get the address of a value, however why can't these simply be used with basic types like ints?

I don't understand why you can't, for example, do something like the following and not use any weird pointer variable creation:

string x = "Hello";
int y = &x; //Set 'y' to the memory address of 'x'
cout << *y; //Output the value at the address 'y' (which is the memory address of 'x')

The code above should, theoretically in my mind, output the value of 'x'. 'y' contains the memory address of 'x', and hence '*y' should be 'x'. If this works (which incidentally on trying to compile it, it doesn't -- it tells me it can't convert from a string to an int, which doesn't make much sense since you'd think a memory address could be stored in an int fine).

Why do we need to use special pointer variable declarations (e.g. string *y = &x)? And inside this, if we take the * operator in the pointer declaration literally in the example in the line above, we are setting the value of 'y' to the memory address of 'x', but then later when we want to access the value at the memory address ('&x') we can use the same '*y' which we previously set to the memory address.

  • 2
    You can't dereference an `int` variable, so `*y` is not defined. – Niklas B. Apr 21 '12 at 16:45
  • 1
    It would be interesting to know why you believe a memory address should be storable in an `int` - given that on some 64-bit platforms (with a 64-bit address space), `int` is defined as a 32-bit integer. – Damien_The_Unbeliever Apr 21 '12 at 16:48
  • 2
    @Damien_The_Unbeliever: Then rephrase the question as "why not use `intptr_t` everywhere?" or "why not use `void*` everywhere?". He wants to know why all the different pointer types are necessary. – Ben Voigt Apr 21 '12 at 16:53
  • @BenVoigt - I think there are two separate concepts here - 1) Why should pointers be thought of as something different than integers, and 2) Why are pointers to different types different. Both ideas/concepts are worth exploring. – Damien_The_Unbeliever Apr 21 '12 at 18:44
  • @Damien: There isn't much reason for `void*` and `intptr_t` to be separate. You got caught up in the size thing, but there are pointer-sized integers, so that's more a question of why `int` and `intptr_t` are different sizes, and has nothing to do with pointers or this question. – Ben Voigt Apr 21 '12 at 19:15

6 Answers6

3

C and C++ resolve type information at compile-time, not runtime. Even runtime polymorphism relies on the compiler constructing a table of function pointers with offsets fixed at compile time.

For that reason, the only way the program can know that cout << *y; is printing a string is because y is strongly typed as a pointer-to-string (std::string*). The program cannot, from the address alone, determine that the object stored at address y is a std::string. (Even C++ RTTI does not allow this, you need enough type information to identify a polymorphic base class.)

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • So the idea is that a pointer is just a nice way to handle the whole "look, a memory address which begins a certain structure of data which we don't know right now!" problem? –  Apr 21 '12 at 22:17
  • @Joesavage1: You'd better know what the structure (type) of the data is a-priori, because you can't find out from the data. At runtime, the pointer only contains an address, but at compile-time, your variable of pointer type also has information telling the compiler what kind of data it points to. That's how `cout << *y;` knows to print a string (instead of a number, etc.), because `y` is a `std::string*`. If you look at HostileFork's answer, to convert from an integer back to a pointer, *you* have to tell the compiler what's pointed-to. That information isn't kept with the object. – Ben Voigt Apr 21 '12 at 22:31
2

In short, C is a typed language. You cannot store arbitrary things in variables.

Check the type safety article at wikipedia. C/C++ prevents problematic operations and functional calls at compliation time by checking the type of the operands and function parameters (but note that with explicit casts you can change the type of an expression).

It doesn't make sense to store a string in an integer -> The same way it doesn't make sense to store a pointer in it.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
  • Except you CAN store a pointer in an integer (with a cast). It just loses its type information when you do. – Ben Voigt Apr 21 '12 at 16:54
  • Uhm.. I would say you can *cast* a pointer to an integer (lose type) and *then* store it in a pointer (but then you are already storing a pointer..) – Karoly Horvath Apr 21 '12 at 17:00
2

Simply put, a memory address has a type, which is pointer. Pointers are not ints, so you can't store a pointer in an int variable. If you're curious why ints and pointers are not fungible, it's because the size of each is implementation defined (with certain restrictions) and there is no guarantee that they will be the same size.

For instance, as @Damien_The_Unbeliever pointed out pointers on a 64-bit system must be 64-bits long, but it is perfectly legal for an int to be 32-bits, as long as it is no longer than a long and nor shorter than a short.

As to why each data type has it's own pointer type, that's because each type (especially user-defined types) is structured differently in memory. If we were to dereference typeless (or void) pointers, there would be no information indicating how that data should be interpreted. If, on the other hand, you were to create a universal pointer and do away with the "inconvenience" of specifying types, each entity in memory would probably have to be stored along-side its type information. While this is doable, it's far from efficient, and efficiency is on of C++'s design goals.

jpm
  • 3,165
  • 17
  • 24
  • This doesn't explain why `std::string*` is a separate type from `intptr_t`. Size is a very small part of the reason. – Ben Voigt Apr 21 '12 at 16:55
  • @BenVoigt Made some additions that hopefully address the rest of the question. – jpm Apr 21 '12 at 17:13
  • So essentially the pointer isn't necessary in itself as the techniques could theoretically be done through other methods, however it creates a really nice way to do everything including error detection, architecture adaption, reading the right number of bits from memory, knowing what type the thing that's being read will be (type-wise), etc. ? –  Apr 21 '12 at 22:22
1

C++ is a strongly typed language, and pointers and integers are different types. By making those separate types the compiler is able to detect misuses and tell you that what you are doing is incorrect.

At the same time, the pointer type maintains information on the type of the pointed object, if you obtain the address of a double, you have to store that in a double*, and the compiler knows that dereferencing that pointer you will get to a double. In your example code, int y = &x; cout << *y; the compiler would loose the information of what y points to, the type of the expression *y would be unknown and it would not be able to determine which of the different overloads of operator<< to call. Compare that with std::string *y = &x; where the compiler sees y it knows it is a std::string* and knows that dereferencing it you get to a std::string (and not a double or any other type), enabling the compiler to statically check all expressions that contain y.

Finally, while you think that a pointer is just the address of the object and that should be representable by an integral type (which on 64bit architectures would have to be int64 rather than int) that is not always the case. There are different architectures on which pointers are not really representable by integral values. For example in architectures with segmented memory, the address of an object can contain both a segment (integral value) and an offset into the segment (another integral value). On other architectures the size of pointers was different than the size of any integral type.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • Two integers can still be stored in a bigger integer ;) – Ben Voigt Apr 21 '12 at 17:02
  • @BenVoigt: But they would not necessarily have the same meaning. Segments could overlap, the actual address in 16bit x86 was (IIRC) `segment << 2 + offset`, where `segment` and `offset` were 16 bits. Even if the actual address was only 10bits, the segment information is important so you would have to store the whole 32bits, and then have it loaded separately into different registers prior to dereferencing... Not impossible, but quite a bit of work for a compiler to provide a feature whose only advantage is that it actually looses the type information. – David Rodríguez - dribeas Apr 21 '12 at 17:12
1

Some very low-level languages... like machine language... operate exactly as you describe. A number is a number, and it's up to the programmer to hold it in their heads what it represents. Generally speaking, the hope of higher level languages is to keep you from the concerns and potential for error that comes from that style of development.

You can actually disregard C++'s type-safety, at your peril. For instance, the gcc on a 32-bit machine I have will print "Hello" when I run this:

string x = "Hello";
int y = reinterpret_cast<int>(&x);
cout << *reinterpret_cast<string*>(y) << endl;

But as pretty much every other answerer has pointed out, there's no guarantee it would work on another computer. If I try this on a 64-bit machine, I get:

error: cast from ‘std::string*’ to ‘int’ loses precision

Which I can work around by changing it to a long:

string x = "Hello";
long y = reinterpret_cast<long>(&x);
cout << *reinterpret_cast<string*>(y) << endl;

The C++ standard specifies minimums for these types, but not maximums, so you really don't know what you're going to be dealing with when you face a new compiler. See: What does the C++ standard state the size of int, long type to be?

So the potential for writing non-portable code is high once you start going this route and "casting away" the safeties in the language. reinterpret_cast is the most dangerous type of casting...

When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?

But that's just technically drilling down into the "why not int" part specifically, in case you were interested. Note that as @BenVoight points out in the comment below, there does exist an integer type as of C99 called intptr_t which is guaranteed to hold any poniter. So there are much larger problems when you throw away type information than losing precision...like accidentally casting back to a wrong type!

Community
  • 1
  • 1
  • 1
    There's `intptr_t`, which is guaranteed to be the right size. The real problem isn't loss of precision, it's loss of compile-time type information. – Ben Voigt Apr 21 '12 at 22:28
  • @BenVoigt Certainly. But I wasn't trying to make my answer cover everything the others had said in that respect, just showed some casting since no one else had (and the question was about int). Didn't mention `intptr_t` because it isn't a type one is likely to find in common use in the program in other places...very few routines you would find would take it as a parameter. Turning something into an intptr_t does have the right capacity, but gives you a "freak" value which may as well be a `char*`. – HostileFork says dont trust SE Apr 21 '12 at 23:29
0

The language is trying to protect you from conflating two different concepts - even though at the hardware level they are both just sets of bits;

Outside of needing to pass values manually between various parts of a debugger, you never need to know the numerical value.

Outside of archaic uses of arrays, it doesn't make sense to "add 10" to a pointer - so you shouldn't treat them as numeric values.

By the compiler retaining type information, it also prevents you from making mistakes - if all pointers were equal, then the compiler couldn't, helpfully, point out that what you're trying to dereference as an int is a pointer to a string.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • If this were the reason, there could be just one type `ptr`, for all pointers. – Ben Voigt Apr 21 '12 at 17:00
  • @chris - I *am* very rusty here - but as I understand it, pointer arithmetic only works with arrays, and as I also understand it, arrays aren't used much in modern C++. Can you correct me? – Damien_The_Unbeliever Apr 21 '12 at 17:04
  • It can "work" with anything if the memory is all there, but arrays would be the most structured use of it I can think of, yes. I guess if the memory is all there, it would kind of have to represent an array anyways. Iterators are like pointers, except they can actually keep track of where the next element is. – chris Apr 21 '12 at 17:13