How to understand "C++ allows sizeof(char) != sizeof(int)"?

Question

I am reading this post which is related to char and byte, and come across the following words:

An int* could still be implemented as a single hardware pointer, since C++ allows sizeof(char*) != sizeof(int*).

How to understand 'C++ allows sizeof(char*) != sizeof(int*)'?

@Nan: Can you explain what leads you to think that they would be? That would help us clear whatever misconception is leading you to that conclusion. — icktoofay, Dec 31 '15 at 02:45
@NanXiao Only if you can find a statement in the C++ language specification that says so. Why are you removing the closing quotation mark? — user207421, Dec 31 '15 at 02:47
@EJP: Sorry, I don't read C++ specification seriously. Just from my intuition, every pointer should has the same size, the length of machine word. — Nan Xiao, Dec 31 '15 at 02:57
The idea may be confusing but it is possible to have different sizes for different pointer types. Look here for examples [link](http://stackoverflow.com/questions/916051/are-there-any-platforms-where-pointers-to-different-types-have-different-sizes) — , Dec 31 '15 at 03:01
Sounds like you're thinking of pointers as glorified integers — M.M, Dec 31 '15 at 03:08
@NanXiao infamously class pointers in [VC++ for the longest time were fat pointers, sometimes up to 16bytes or larger even on a 32bit machine](http://stackoverflow.com/q/12006854/332733). The point is that the compiler is free to make pointers whatever it wants to for purposes of optimization. So we shouldn't assume. — Mgetz, Dec 31 '15 at 03:51
@Mgetz: the reasons for pointers to member functions differing in size are completely unrelated to this question. Separately, this question's issue is *not* generally *"for purposes of optimization"* - some systems had functional hardware reasons for pointers being of different sizes, and it would have been actively perverse and unhelpful to adopt the largest size for all so I wouldn't call not having done so an optimisation - more avoiding a crazy pessimisation. — Tony Delroy, Dec 31 '15 at 09:11
I covered alot of this ground in my answer to [Why must an enumeration's size be provided when it is forward declared?](http://stackoverflow.com/a/29035972/1708801) — Shafik Yaghmour, Dec 31 '15 at 09:26
Possible duplicate of [Does the size of pointers vary in C?](http://stackoverflow.com/questions/3520059/does-the-size-of-pointers-vary-in-c) — phuclv, Dec 31 '15 at 09:42
[Are there any platforms where pointers to different types have different sizes?](http://stackoverflow.com/q/916051/995714), [Can the size of pointers vary between data and function pointers?](http://stackoverflow.com/q/1473935/995714) — phuclv, Dec 31 '15 at 09:43
@LưuVĩnhPhúc this is C++ not C the question may be a dupe, but not of that question — Mgetz, Dec 31 '15 at 13:24

score 3 · Answer 1 · answered Dec 31 '15 at 03:26

There are (or were) machines which could only address entire "words", where a word was large enough to hold several characters. For example, the PDP-6/10 had a word-size of 36-bits. On such a machine, you might implement 9-bit bytes and represent a byte pointer as the combination of a word pointer and a bit-index within the word. A naïve implementation would require two words for such a pointer, even though a integer pointer would be just a word pointer, occupying a single word.

(The real PDP-6/10 allowed for smaller character sizes -- 6- and 7-bit codings were common, depending on use case -- and since a pointer could not occupy a whole word, it was possible to make a character pointer including bit offset and word address fit inside a single word. But a similar architecture these days would not have the draconian restriction on address space, so that wouldn't work anymore.)

songyuanyao · Accepted Answer · 2015-12-31T03:51:21.850

In short, the standard doesn't guarantee it, the result is implementation-defined.

From the standard about sizeof ($5.3.3/1 Sizeof [expr.sizeof])

The sizeof operator yields the number of bytes in the object representation of its operand.

and pointer is compound type ($3.9.2/1.3 Compound types [basic.compound])

pointers to void or objects or functions (including static members of classes) of a given type, 8.3.1;

and ($3.9.2/3 Compound types [basic.compound])

The value representation of pointer types is implementation-defined.

even though ($3.9.2/3 Compound types [basic.compound])

Pointers to layout-compatible types shall have the same value representation and alignment requirements (3.11).

but char and int don't need to have the same value representation. The starndard only says ($3.9.1/2 Fundamental types [basic.fundamental])

There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list.

and ($3.9.1/3 Fundamental types [basic.fundamental]) etc.

each signed integer type has the same object representation as its corresponding unsigned integer type.

Tony Delroy · Answer 3 · 2015-12-31T08:56:37.357

itsnotmyrealname and rici touch on the hardware drivers for this, but I thought it might help to walk through the simplest possible scenario leading to different pointer sizes...

Imagine a CPU that can address 32-bit words of memory, and that the C++ int type is also to be 32 bits wide.

This hypothetical CPU addresses specific words using a numbering: 0 for the first word (bytes 0-3), 1 for the second (bytes 4-7) and so on. int*{0} is therefore your first word in memory (assuming no bizarre nullptr shenanigans require otherwise), int*{1} the second etc..

What should the compiler do to support 8-bit char types? It may have to implement char* support using an int* to identify the word in memory, but still need an extra two bits to store 0, 1, 2 or 3 to say which of the bytes in that word are being pointed to. It would effectively need to generate machine code much as a C++ program might if using...

struct __char_ptr
{
    unsigned* p_;
    unsigned byte_ : 2;
    char get() const { return (*p_ & (0xFF << (8*byte_)) >> 8*byte_; }
    void set(char c) { *p_ &= ~(0xFF << (8*byte_)); *p |= c << 8*byte_; }
};

On such a system - sizeof(__char_ptr) > sizeof(int*). The C++ Standard's flexibility allows compliant C++ implementations for (and code portability to/from) weird systems with this or similar issues.

score 2 · Answer 4 · edited May 23 '17 at 12:15

This is also the reason why we can not forward declare enums without providing the underlying size in my answer I provide several references that cover why this is so.

in this comp.lang.c++ discussion: GCC and forward declaration of enum:

[...] While on most architectures it may not be an issue, on some architectures the pointer will have a different size, in case it is a char pointer. [...]

and we can find from this C-Faq entry Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types? it says:

Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char *'s) than word pointers (int *'s). [...] Some 64-bit Cray machines represent int * in the lower 48 bits of a word; char * additionally uses some of the upper 16 bits to indicate a byte address within a word. [...]

and furthermore:

[...]The Eclipse MV series from Data General has three architecturally supported pointer formats (word, byte, and bit pointers), two of which are used by C compilers: byte pointers for char * and void *, and word pointers for everything else. For historical reasons during the evolution of the 32-bit MV line from the 16-bit Nova line, word pointers and byte pointers had the offset, indirection, and ring protection bits in different places in the word. Passing a mismatched pointer format to a function resulted in protection faults. Eventually, the MV C compiler added many compatibility options to try to deal with code that had pointer type mismatch errors. [...] The old HP 3000 series uses a different addressing scheme for byte addresses than for word addresses; like several of the machines above it therefore uses different representations for char * and void * pointers than for other pointers. [...]

Trevor Hickey · Answer 5 · 2015-12-31T03:56:16.843

1

The standard says:

5.3.3 Sizeof
sizeof(char) , sizeof(signed char) and sizeof(unsigned char) are 1 . The result of sizeof applied to any other fundamental type ( 3.9.1 ) is implementation-defined.

Since pointers are "compound types", and the standard makes no mention of byte size consistency between pointers, the compiler writers are free to do as they wish.

edited Dec 31 '15 at 03:56

answered Dec 31 '15 at 02:48

Trevor Hickey

36,288
32
162
271

2

He said `sizeof(char*)`, not `char`. He's asking about why two pointers may not have the same size. – Nicol Bolas Dec 31 '15 at 02:54
4

@NicolBolas That comes under 'any other fundamental type'. A pointer to a fundamental type is a fundamental type unless I am greatly mistaken. – user207421 Dec 31 '15 at 02:55
4

@EJP: Nope. Pointers are not fundamental types. They are compound types. – Benjamin Lindley Dec 31 '15 at 03:07
@EJP Yes. It seems "std::is_compound::value" emits true – Trevor Hickey Dec 31 '15 at 03:13

How to understand "C++ allows sizeof(char*) != sizeof(int*)"?

5 Answers5

How to understand "C++ allows sizeof(char) != sizeof(int)"?