9

I am reading this post which is related to char and byte, and come across the following words:

An int* could still be implemented as a single hardware pointer, since C++ allows sizeof(char*) != sizeof(int*).

How to understand 'C++ allows sizeof(char*) != sizeof(int*)'?

user207421
  • 305,947
  • 44
  • 307
  • 483
Nan Xiao
  • 16,671
  • 18
  • 103
  • 164
  • 5
    What *don't* you understand about it? – user207421 Dec 31 '15 at 02:43
  • 2
    @EJP: Shouldn't they be same? – Nan Xiao Dec 31 '15 at 02:44
  • 2
    @Nan: Can you explain what leads you to think that they would be? That would help us clear whatever misconception is leading you to that conclusion. – icktoofay Dec 31 '15 at 02:45
  • 1
    @NanXiao Only if you can find a statement in the C++ language specification that says so. Why are you removing the closing quotation mark? – user207421 Dec 31 '15 at 02:47
  • 1
    @EJP: Sorry, I don't read C++ specification seriously. Just from my intuition, every pointer should has the same size, the length of machine word. – Nan Xiao Dec 31 '15 at 02:57
  • 5
    The idea may be confusing but it is possible to have different sizes for different pointer types. Look here for examples [link](http://stackoverflow.com/questions/916051/are-there-any-platforms-where-pointers-to-different-types-have-different-sizes) –  Dec 31 '15 at 03:01
  • Sounds like you're thinking of pointers as glorified integers – M.M Dec 31 '15 at 03:08
  • 2
    @NanXiao infamously class pointers in [VC++ for the longest time were fat pointers, sometimes up to 16bytes or larger even on a 32bit machine](http://stackoverflow.com/q/12006854/332733). The point is that the compiler is free to make pointers whatever it wants to for purposes of optimization. So we shouldn't assume. – Mgetz Dec 31 '15 at 03:51
  • @Mgetz: the reasons for pointers to member functions differing in size are completely unrelated to this question. Separately, this question's issue is *not* generally *"for purposes of optimization"* - some systems had functional hardware reasons for pointers being of different sizes, and it would have been actively perverse and unhelpful to adopt the largest size for all so I wouldn't call not having done so an optimisation - more avoiding a crazy pessimisation. – Tony Delroy Dec 31 '15 at 09:11
  • I covered alot of this ground in my answer to [Why must an enumeration's size be provided when it is forward declared?](http://stackoverflow.com/a/29035972/1708801) – Shafik Yaghmour Dec 31 '15 at 09:26
  • 1
    Possible duplicate of [Does the size of pointers vary in C?](http://stackoverflow.com/questions/3520059/does-the-size-of-pointers-vary-in-c) – phuclv Dec 31 '15 at 09:42
  • [Are there any platforms where pointers to different types have different sizes?](http://stackoverflow.com/q/916051/995714), [Can the size of pointers vary between data and function pointers?](http://stackoverflow.com/q/1473935/995714) – phuclv Dec 31 '15 at 09:43
  • @LưuVĩnhPhúc this is C++ not C the question may be a dupe, but not of that question – Mgetz Dec 31 '15 at 13:24

5 Answers5

3

There are (or were) machines which could only address entire "words", where a word was large enough to hold several characters. For example, the PDP-6/10 had a word-size of 36-bits. On such a machine, you might implement 9-bit bytes and represent a byte pointer as the combination of a word pointer and a bit-index within the word. A naïve implementation would require two words for such a pointer, even though a integer pointer would be just a word pointer, occupying a single word.

(The real PDP-6/10 allowed for smaller character sizes -- 6- and 7-bit codings were common, depending on use case -- and since a pointer could not occupy a whole word, it was possible to make a character pointer including bit offset and word address fit inside a single word. But a similar architecture these days would not have the draconian restriction on address space, so that wouldn't work anymore.)

rici
  • 234,347
  • 28
  • 237
  • 341
2

In short, the standard doesn't guarantee it, the result is implementation-defined.

From the standard about sizeof ($5.3.3/1 Sizeof [expr.sizeof])

The sizeof operator yields the number of bytes in the object representation of its operand.

and pointer is compound type ($3.9.2/1.3 Compound types [basic.compound])

pointers to void or objects or functions (including static members of classes) of a given type, 8.3.1;

and ($3.9.2/3 Compound types [basic.compound])

The value representation of pointer types is implementation-defined.

even though ($3.9.2/3 Compound types [basic.compound])

Pointers to layout-compatible types shall have the same value representation and alignment requirements (3.11).

but char and int don't need to have the same value representation. The starndard only says ($3.9.1/2 Fundamental types [basic.fundamental])

There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list.

and ($3.9.1/3 Fundamental types [basic.fundamental]) etc.

each signed integer type has the same object representation as its corresponding unsigned integer type.

songyuanyao
  • 169,198
  • 16
  • 310
  • 405
2

itsnotmyrealname and rici touch on the hardware drivers for this, but I thought it might help to walk through the simplest possible scenario leading to different pointer sizes...

Imagine a CPU that can address 32-bit words of memory, and that the C++ int type is also to be 32 bits wide.

This hypothetical CPU addresses specific words using a numbering: 0 for the first word (bytes 0-3), 1 for the second (bytes 4-7) and so on. int*{0} is therefore your first word in memory (assuming no bizarre nullptr shenanigans require otherwise), int*{1} the second etc..

What should the compiler do to support 8-bit char types? It may have to implement char* support using an int* to identify the word in memory, but still need an extra two bits to store 0, 1, 2 or 3 to say which of the bytes in that word are being pointed to. It would effectively need to generate machine code much as a C++ program might if using...

struct __char_ptr
{
    unsigned* p_;
    unsigned byte_ : 2;
    char get() const { return (*p_ & (0xFF << (8*byte_)) >> 8*byte_; }
    void set(char c) { *p_ &= ~(0xFF << (8*byte_)); *p |= c << 8*byte_; }
};

On such a system - sizeof(__char_ptr) > sizeof(int*). The C++ Standard's flexibility allows compliant C++ implementations for (and code portability to/from) weird systems with this or similar issues.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
2

This is also the reason why we can not forward declare enums without providing the underlying size in my answer I provide several references that cover why this is so.

in this comp.lang.c++ discussion: GCC and forward declaration of enum:

[...] While on most architectures it may not be an issue, on some architectures the pointer will have a different size, in case it is a char pointer. [...]

and we can find from this C-Faq entry Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types? it says:

Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char *'s) than word pointers (int *'s). [...] Some 64-bit Cray machines represent int * in the lower 48 bits of a word; char * additionally uses some of the upper 16 bits to indicate a byte address within a word. [...]

and furthermore:

[...]The Eclipse MV series from Data General has three architecturally supported pointer formats (word, byte, and bit pointers), two of which are used by C compilers: byte pointers for char * and void *, and word pointers for everything else. For historical reasons during the evolution of the 32-bit MV line from the 16-bit Nova line, word pointers and byte pointers had the offset, indirection, and ring protection bits in different places in the word. Passing a mismatched pointer format to a function resulted in protection faults. Eventually, the MV C compiler added many compatibility options to try to deal with code that had pointer type mismatch errors. [...] The old HP 3000 series uses a different addressing scheme for byte addresses than for word addresses; like several of the machines above it therefore uses different representations for char * and void * pointers than for other pointers. [...]

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
1

The standard says:

5.3.3 Sizeof
sizeof(char) , sizeof(signed char) and sizeof(unsigned char) are 1 . The result of sizeof applied to any other fundamental type ( 3.9.1 ) is implementation-defined.

Since pointers are "compound types", and the standard makes no mention of byte size consistency between pointers, the compiler writers are free to do as they wish.

Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271