55

Recently, I came across the following statement:

It's quite common for all pointers to have the same size, but it's technically possible for pointer types to have different sizes.

But then I came across this which states that:

While pointers are all the same size, as they just store a memory address, we have to know what kind of thing they are pointing TO.

Now, I am not sure which of the above statements is correct. The second quoted statement looks like it's from the C++ notes of Computer Science, Florida State University.


Here's why, in my opinion all pointers should have the same size:

1) Say we have:

int i = 0;
void* ptr = &i; 

Now, suppose the C++ standard allows pointers to have different sizes. Further suppose that on some arbitrary machine/compiler (since it is allowed by the standard), a void* has size 2 bytes while a int* has size 4 bytes.

Now, I think there is a problem here which is that the right hand side has an int* which has size 4 bytes while on the left hand side we have a void* which has size 2 bytes. Thus, when the implicit conversion happens from int* to void* there will be some loss of information.

2) All pointers hold addresses. Since for a given machine all addresses have the same size, it is very natural (logical) that all pointers should also have the same size.

Therefore, I think that the second quote is true.


My first question is what does the C++ standard say about this?

My second question is, if the C++ standard does allow pointers to be of different size, then is there a reason for it? I mean allowing pointers to be of different size seems a bit unnatural to me (considering the 2 points I explained above). So, I am pretty sure that the standard committee must have already given this (that pointers can have different sizes) thought and already have a reason for allowing pointers to have different sizes. Note that I am asking this (2nd question) only if the standard does allow pointers to have different size.

Jason
  • 36,170
  • 5
  • 26
  • 60
  • 4
    One key point that you seem to be missing in your analysis is that not all systems have a single, uniform, size of address for all possible types of data and code. Some DSPs, for example, which use a native 16-bit word size require an extra bit for addressing `char` types (and thus also for `void*`). Other less ‘general purpose’ hardware may also have similarly unusual memory requirements. – Austin Hemmelgarn Apr 14 '22 at 21:49
  • All pointers are not created equal. Since C++ (like C) needs to be closer to the metal, C++ must be able to handle these issues. This is more true in the embedded world, such as ARM... – ChuckCottrill Apr 15 '22 at 02:06
  • 2
    @ChuckCottrill: ARM is not a very good example; it's a normal byte-addressable architecture with a simple 32-bit flat memory model, where all pointers (code and data) are equivalent. (Some old ARMs were Harvard, so code and data pointers pointed into different address-spaces, but still had the same size.) Better examples would be DSPs like mentioned in some answers, or Harvard machines with narrower address-space for code. – Peter Cordes Apr 15 '22 at 02:43
  • 1
    Some candidates: *[Are all data pointers the same size in one platform for all data types?](https://stackoverflow.com/questions/1241205/)* (closed as a duplicate of *[Are there any platforms where pointers to different types have different sizes?](https://stackoverflow.com/questions/916051/)*) and *[What is the size of a pointer?](https://stackoverflow.com/questions/6751749/)* – Peter Mortensen Apr 15 '22 at 23:59
  • 1
    @PeterMortensen The first 2 links are for a different programming language so they're not duplicates of this one. The 3rd one is a subset of this one, and this question is much higher quality than that one, as are the answers here compared to the answers on that one, so I've closed that one as a duplicate. Let me know if you find any other, better, targets, and I'll swing the targets around to point to the best one. – cigien Apr 16 '22 at 00:09
  • " loss of information" - for what it's worth, this doesn't actually follow, because not every pointer type needs to meaningfully "use" all the bits it occupies. So, imagine some completely perverse C++ implementation that only addresses a 16-bit memory space, but just decides for the fun of it that `int*` will have size 4 anyway. No problem, just add 2 bytes of padding bits which always take the value 0. No valid C++ program can generate an `int*` with non-0 padding. There's no loss of information in stuffing that into a size-2 `void*`, you just chop off the padding. Daft, but legal (I think). – Steve Jessop Apr 16 '22 at 14:33

10 Answers10

91

While it might be tempting to conclude that all pointers are the same size because "pointers are just addresses, and addresses are just numbers of the same size", it is not guaranteed by the standard and thus cannot be relied upon.

The C++ standard explicitly guarantees that:

  • void* has the same size as char* ([basic.compound]/5)
  • T const*, T volatile*, and T const volatile* have the same size as T*. This is because cv-qualified versions of the same type are layout-compatible, and pointers to layout-compatible types have the same value representation ([basic.compound]/3).
  • Similarly, any two enum types with the same underlying type are layout-compatible ([dcl.enum]/9), therefore pointers to such enum types have the same size.

It is not guaranteed by the standard, but it is basically always true in practice, that pointers to all class types have the same size. The reason for this is as follows: a pointer to an incomplete class type is a complete type, meaning that you are entitled to ask the compiler sizeof(T*) even when T is an incomplete class type, and if you then ask the compiler sizeof(T*) again later in the translation unit after T has been defined, the result must be the same. Furthermore, the result must also be the same in every other translation unit where T is declared, even if it is never completed in another translation unit. Therefore, the compiler must be able to determine the size of T* without knowing what's inside T. Technically, compilers are still allowed to play some tricks, such as saying that if the class name starts with a particular prefix, then the compiler will assume that you want instances of that class to be subject to garbage collection, and make pointers to it longer than other pointers. In practice, compilers do not seem to use this freedom, and you can assume that pointers to different class types have the same size. If you rely on this assumption, you can put a static_assert in your program and say that it doesn't support the pathological platforms where the assumption is violated.

Also, in practice, it will generally be the case that

  • any two function pointer types have the same size,
  • any two pointer to data member types will have the same size, and
  • any two pointer to function member types will have the same size.

The reason for this is that you can always reinterpret_cast from one function pointer type to another and then back to the original type without losing information, and so on for the other two categories listed above (expr.reinterpret.cast). While a compiler is allowed to make them different sizes by giving them different amounts of padding, there is no practical reason to do this.

(However, MSVC has a mode where pointers to members do not necessarily have the same size. It is not due to different amounts of padding, but simply violates the standard. So if you rely on this in your code, you should probably put a static_assert.)

If you have a segmented architecture with near and far pointers, you should not expect them to have the same size. This is an exception to the rules above about certain pairs of pointer types generally having the same size.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312
  • 2
    It's worth mentioning that C++ on modern mainstream machines (byte addressable, flat memory model) does have the same `sizeof(T*)` for all types, and for non-member function pointers. So when talking about what's actually happening on any given machine, it is normal to point out that all pointers have the same size. (Especially if you're talking in terms of compiling to asm, and a calling convention and ABI). The quote in the question to that effect is one of those useful lies to students, teaching a simpler mental model that's true in practice on the machines the class uses. – Peter Cordes Apr 15 '22 at 01:54
  • 2
    (Historically, `char*` might have taken extra space on a word-addressable machine if it implemented it with an offset inside the word. But C++11's thread-aware memory model [basically forbids that](https://stackoverflow.com/questions/19903338/c-memory-model-and-race-conditions-on-char-arrays); a `char` assignment can't be a non-atomic RMW of the containing word; that would break the case of another thread writing an adjacent array element. So `char` needs to be big enough for the machine to directly address it. Or use an atomic RMW, but that gets very expensive. Or don't support threads) – Peter Cordes Apr 15 '22 at 02:29
  • 4
    @PeterCordes It is not enough to have byte addressable, flat memory for function pointers to be the same size as `void*`: Function pointers may actually be a pair of pointers under the hood. This was the case on the PPC platform where the second pointer allowed access to the global data accessible from the referenced code. Current ABIs usually address global data relative to the program counter, but on PPC you always had to have a pointer to the current table of contents in a register (`r2` if I'm not mistaken). To call a function pointer, you had to set `r2` and then branch to the code. – cmaster - reinstate monica Apr 15 '22 at 11:01
  • 3
    People who have no *particular* reason to expect that their code will be used on obscure architectures where different kinds of pointers are different sizes are fully entitled to expect that that all pointers will be the same size. I've used DSP platforms where both `char` and `int` were 16-bit integers, but I didn't expect that code written for other platforms would run without modification on the DSPs, nor that the code I wrote for the DSPs would run without modification on other platforms. The fact that code not written for a DSP wouldn't work on a DSP is hardly a defect. – supercat Apr 15 '22 at 17:38
  • @supercat People can always restrict their target-audience as they wish, and are generally expected to do so far as it makes sense for their situation. No need to dismiss the DSP, or the DS9K. – Deduplicator Apr 15 '22 at 18:57
  • @Deduplicator: I've written C code on a wide range of "weird" platforms. From my experience, in most cases where one would have to jump through hoops to make code run on obscure platorms and common platforms interchangeably, writing separate code for the different platfroms will take less time and yield better performance. Having code which will run interchangeably on different platforms is good when it would require at most trivial effort, but from my experience the greater the effort required to make something portable to a particular platform, the lower the probability of payoff. – supercat Apr 15 '22 at 19:32
  • Would you mind clarifying what part of the standard the MSVC pointer-to-member thing violates? – user541686 Apr 15 '22 at 20:34
  • @user541686 There are multiple pointer-to-member and pointer-to-member-function formats MSVC uses, and depending on mode they sometimes make baseless assumptions on how the class is laid out when evaluating them for performance. https://learn.microsoft.com/en-us/cpp/build/reference/vmb-vmg-representation-method?view=msvc-170 https://learn.microsoft.com/en-us/cpp/build/reference/vmm-vms-vmv-general-purpose-representation?view=msvc-170 – Deduplicator Apr 15 '22 at 21:07
  • @Deduplicator: I'm aware how MSVC works, yes. What I was asking about is how that contradicts the standard. "Baseless" doesn't imply "violates the standard". (In fact there are quite a few things in the standard that I see as baseless...) – user541686 Apr 16 '22 at 01:34
  • 2
    @user541686 Well, you have to be able to pass a member-pointer from any TU to any other, whether the TU knows anything (but the name) about the class or not. If some TU knows the class uses at most single inheritance, and another knows nothing, they will disagree on the format to use, violating the standard and breaking the executable. – Deduplicator Apr 16 '22 at 03:53
  • @Deduplicator: Ah, I think the piece of the puzzle I was missing was that it's legal to declare pointers-to-members of forward-declared classes. I thought it wasn't legal for some reason. Thanks! – user541686 Apr 16 '22 at 21:51
  • 1
    Beware. Platforms exist where `sizeof(char *) != sizeof(const char *)`; quoting the standard will not save you from the disaster if you assume it where it is not true. – Joshua Apr 17 '22 at 02:57
  • 2
    @Joshua Can you name one? Because I cannot fathom why there should be any difference in representation at all. Not that such a difference seems unconformant. – Deduplicator Apr 17 '22 at 08:54
  • 1
    @Deduplicator: Target: Microchip CPUs. The size difference is because a pointer to RAM will always fit in two bytes but a pointer to ROM (where string constants live) requires three bytes. This results in casting a `const char *` to a `char *` and back does not work. – Joshua Apr 17 '22 at 14:25
  • 1
    @Joshua So, passing as pointer to constant is a pessimization there? Ouch. – Deduplicator Apr 17 '22 at 15:14
  • It would be really difficult to construct a platform where "any two function pointers are the same size" is not true; but if it does occur, type `void (*)(...)` is the function pointer equivalent of `void *`. – Joshua Apr 18 '22 at 16:10
  • @Albert It's guaranteed that there's no loss of information. I don't think you really need to rely on `void*` being physically as long as `int*`. It will usually be true, but an unusual implementation could put padding bytes in `int*` that are not present in `void*`, making the former longer. – Brian Bi Apr 19 '22 at 13:39
  • @Albert There is no loss of information because the standard guarantees no loss of information when going from `int*` to `void*` and then back to `int*`. If there are any padding bits in `int*`, the contents of those bits is not information. – Brian Bi Apr 21 '22 at 18:14
22

Member function pointers can differ:

void* ptr;

size_t (std::string::*mptr)();

std::cout << sizeof(ptr) << '\n';
std::cout << sizeof(mptr) << std::endl;

This printed

8
16

on my system. Background is that member function pointers need to hold additional information e.g. about virtuality etc.

Historically there were systems on which existed 'near' and 'far' pointers which differed in size as well (16 vs. 32 bit) – as far as I am aware of they don't play any role nowadays any more, though.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • If void* is 8 and member function pointers is 16, then is uintptr_t 8 or 16? – Irelia Apr 14 '22 at 10:45
  • 5
    It is because member pointers aren't actually pointers. All other pointers are pointers and should have the same size. – ixSci Apr 14 '22 at 10:46
  • But why would the address to a member function need to be more than 8 bytes? I understand it can hold virtual information; however wouldn't that just be placed after the address to the function? – Irelia Apr 14 '22 at 10:47
  • 17
    @ixSci: No, there they should not be. There's nothing in the Standard that says so, and that omission is intentional. – MSalters Apr 14 '22 at 10:47
  • @Irelia on my system `uintptr_t` is of size 8, meaning it wouldn't be able to hold a member function pointer *unless* there exist other means to restore the complete information on casting back to member function pointer, which I have quite some doubts about... – Aconcagua Apr 14 '22 at 10:49
  • 2
    @Irelia A member pointer cannot be converted to an integral type, also not to `uintptr_t` in any case. – user17732522 Apr 14 '22 at 10:51
  • @user17732522: I'd have to check if that is not perhaps _unspecified_. In other words, I wouldn't could on that for SFINAE. – MSalters Apr 14 '22 at 10:54
  • @MSalters I don't see anything allowing it in https://www.eel.is/c++draft/expr.reinterpret.cast. – user17732522 Apr 14 '22 at 10:58
  • @MSalters if they shouldn't be of the same size, how could you possible implement the following casting which should give the original pointer? P1->P2->P1 ==> P1 == P1. If sizeof(P2 < P1) then P2 can't hold all the data of P1 to restore it correctly later. – ixSci Apr 14 '22 at 10:59
  • 2
    @ixSci Well, in your example `P2` doesn't need to be equal in size to `P1`, solely *not smaller*... – Aconcagua Apr 14 '22 at 11:02
  • It doesn't matter, you can switch sides. If things can have different sizes then eventually there is a smaller thing. And this conversion should yield the same pointer back. – ixSci Apr 14 '22 at 11:04
  • 2
    @ixSci: What you wrote holds **only** if `decltype(P2)==void*`. Also, technically any pointer type may have padding or trap bits. I could add 77 padding bits to `P1`, and the C++ rules don't require that padding bits are conserved. As a more realistic example, 8086 far pointers are 32 bits but only 20 bits are significant. You can squeeze the 32 bits into 3 bytes and back. – MSalters Apr 14 '22 at 11:04
  • 9
    @ixSci `It doesn't matter, you can switch sides.` No, you can't switch sides. There is no rule saying that all pointers can be converted to all other pointers and back without losing the original value. – eerorika Apr 14 '22 at 11:05
  • 1
    @MSalters well, what I wrote is [required by the Standard](https://eel.is/c++draft/expr.reinterpret.cast#7). – ixSci Apr 14 '22 at 11:06
  • 1
    @eerorika yes, there is. As long as alignments don't clash. – ixSci Apr 14 '22 at 11:07
  • @ixSci: The bit you quote already shows why you can't exchange P1 and P2, so your rebuttal of Aconcagua fails. And `void*` doesn't _have_ alignment requirements, so that's why it can be that universal type P2. (There's also `char* `which for `memcpy` reasons has the same guarantees) – MSalters Apr 14 '22 at 11:12
  • 9
    @ixSci `As long as alignments don't clash.` Hence, **not all**. You cannot deduce the equal pointer sizes based on this rule. Equal size per alignment perhaps, but not all pointers. – eerorika Apr 14 '22 at 11:12
  • 5
    Ok, I was wrong. They indeed might be of different sizes if corresponding objects they point to are of different alignments. Thanks guys for this small discussion. – ixSci Apr 14 '22 at 11:20
  • @MSalters, I'm not sure that (shoving a 16+16 bit segment+offset pointer into 20 bits and back) actually works. You could do it and get back a pointer to the same memory location, but you'd have a hard time guaranteeing you'd get _the same pointer value_ back. – ilkkachu Apr 14 '22 at 21:59
  • @ilkkachu: under C++ rules, the bit pattern doesn't matter. If and only if two pointers point to the same object, they are equal. `0000:C000 == 0C00:0000` since they both have the same value `0C000`. – MSalters Apr 14 '22 at 23:01
  • @MSalters, hmm, right, that actually sounds like it makes sense within the language. Does mean the compiler will have to do that bit of work when comparing pointers, but I guess that's how it goes. Sorry, I should have thought it through. – ilkkachu Apr 15 '22 at 09:18
  • @MSalters: In normal x86 usage, any given region of live storage would always be accessed via pointers having the same segment+offset combination, and all pointers to different parts of the same allocation would share the same segment value. If a pointer that would be valid for accessing a region of storage had a particular segment+offset combination, no other segment+offset combination could be a *valid* pointer to that same storage. Use of an invalid pointer would invoke UB, so even if it would happen to access the same storage as some other seg:ofs combo, the languge wouldn't define it. – supercat Apr 15 '22 at 18:19
  • @ilkkachu: There was no need for compilers to use pointer gymnastics except in the seldom-used (and very slow) "huge" model. A call to `malloc(16384)` could only yield 0x1234:0x1000`, if no other *valid* pointers existed that would identify physical addresses 0x13340 to 0x1733F, and no valid pointers whose segment wasn't 0x1234 would identify any such storage until that allocated region was freed. – supercat Apr 15 '22 at 18:24
  • @ilkkachu: A function like `memmove` could (and typically *would*) simply compare the offset parts of pointers when selecting a bottom-up or top-down copy operation, since either the segments would match and the offset comparison would be meaningful, or the segments wouldn't match and the source and destination regions *wouldn't overlap*, meaning bottom-up and top-down copy operations would be *equally acceptable*. Hardware wouldn't care whether a value written to 0x1234:0x1000 was read using that address, or 0x1200:0x1340, or 0x1334:0x0000, but code which wrote storage using 0x1234:0x1000... – supercat Apr 15 '22 at 18:39
  • ...would essentially never use any segment:offset combination other than 0x1234:0x1000 to read it. When I first started with x86 I always thought in terms of physical address, but that's a mistake. It's generally better to think of every segment value as having a (possibly empty) range of "usable" offsets at any given time, such that all segments' ranges of usable offsets represent disjoint physical addresses. – supercat Apr 15 '22 at 18:42
22

A few rules:

  1. The sizes of plain-old-data pointers can differ, e.g. double* can be (and often is) larger than int*. (Think of architectures with off-board floating point units.)

  2. void* must be sufficiently large to hold any object pointer type.

  3. The size of any non-plain-old-data pointer is the same as any other. In other words sizeof(myclass*) == sizeof(yourclass*).

  4. sizeof(const T*) is the same as sizeof(T*) for any T; plain-old-data or otherwise

  5. Member function pointers are not pointers. Pointers to non-member functions, including static member functions, are pointers.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 7
    For 2.: "any _object_ pointer type", an implementation doesn't need to provide conversion of function pointers to `void*`. – user17732522 Apr 14 '22 at 10:54
  • @user17732522: Indeed, that's what rule (5) is for. – Bathsheba Apr 14 '22 at 10:55
  • Do you mean with (5) that C++98 member function pointers _are_ pointers? Because they're not new to C++11. – MSalters Apr 14 '22 at 10:55
  • 3
    I am referring to free function pointers, not member function pointers. – user17732522 Apr 14 '22 at 10:56
  • 4
    @Bathsheba No. Pointers to functions are pointer types. 5. doesn't apply to them because pointers to functions are not pointers to member functions. To clarify user17732522's comment, pointers to functions aren't guaranteed to be convertible to `void*`. On systems where they aren't convertible, 2. doesn't need to hold. However, on systems where they are convertible, 2. is guaranteed to hold. – eerorika Apr 14 '22 at 10:57
  • Regarding 3. what rule guarantees this? – eerorika Apr 14 '22 at 10:59
  • @eerorika: Right, clearer now I hope. Indeed pointers to C-style functions or `static` member functions are pointers, and `void*` needs to be large enough to accommodate them. – Bathsheba Apr 14 '22 at 11:00
  • @eerorika: It's in the standard somewhere, either explicit or otherwise. – Bathsheba Apr 14 '22 at 11:00
  • @eerorika I think there is only https://www.eel.is/c++draft/basic#compound-3.sentence-14 (as far as explicit guarantees go). – user17732522 Apr 14 '22 at 11:06
  • @Bathsheba: I recall the WG21 discussion on this. The reason for the discussion was `dlopen` and its Windows equivalent `GetProcAddress` (which both support code and data pointers, but `dlopen` formally returns a data pointer and `GetProcAddress` a code pointer). This was the case that led to Conditionally Supported Behavior – MSalters Apr 14 '22 at 11:07
  • @user17732522 Hmm. That rule seems to only cover layout compatible types, rather than all non-pod class types. Doesn't seem a good fit especially since only standard layout types (previously pod types) can be layout compatible with one another. – eerorika Apr 14 '22 at 11:10
  • 7
    @Bathsheba `Indeed pointers to C-style functions or static member functions are pointers, and void* needs to be large enough to accommodate them.` Only in the case that `void*` and pointers to functions are convertible to one another, as I [clarified](https://stackoverflow.com/questions/71870205/do-all-pointers-have-the-same-size-in-c/71870316?noredirect=1#comment127002469_71870413). That convertibility is not guaranteed by C++ standard. – eerorika Apr 14 '22 at 11:20
  • "_Member function pointers are not pointers_" Is this the official position of the C++ standard, or is this an (unofficial) suggestion on how to view things? (e.g. "Well, the standard may call them pointers, but they behave so differently from any other pointer it's really more useful to think of them as their own thing.") – R.M. Apr 14 '22 at 19:06
  • @R.M.: `std::is_pointer_v` is `false`, so I'd count it as official. – MSalters Apr 14 '22 at 19:32
  • 1
    "void* must be sufficiently large to hold any pointer type." --> `void *` has no requirement to fully round-trip a function pointer. Better as ""void* must be sufficiently large to encode any _object_ pointer type." – chux - Reinstate Monica Apr 15 '22 at 01:30
  • 1
    "`double*` often is larger than `int*`" - wait, what C++ implementations are you talking about here? Yes, CPUs might off-board float operations, and yes `double` often is bigger than `int`. But you have to be able to do both `*(double*)std::malloc(sizeof(double) + sizeof(int)) = 0` and `*(int*)std::malloc(sizeof(double) + sizeof(int)) = 0` (I omit the null check for brevity). So what's the point of making `double*` and `int*` different sizes just because `double` and `int` are, given that they both have to represent anything returned from `malloc`? – Steve Jessop Apr 16 '22 at 14:41
  • @SteveJessop There are (freestanding) implementations that don't even *have* `malloc`, so they certainly *don't* have to represent anything returned from `malloc` in every implementation. – Chris Apr 16 '22 at 19:33
  • @Chris: fair point, but I still ask, how many such freestanding C++ implementations are we talking about here that mean we can say C++ implementations "often" make `double*` larger than `int*`? I confess I've gone in way harder on this point than is justified, since I suspect probably Bathsheba just had in mind that `double` is typically larger than `int` (commonly 8 byte `double` vs 4 byte `int`). But I genuinely do want to be given an example of `sizeof(double*) > sizeof(int*)`, because I love seeing worst-case scenarios of malicious standards-compliance ;-) – Steve Jessop Apr 26 '22 at 00:09
  • And even without `malloc` you can placement-new either a double or an int into any sufficiently aligned array of `char`. `malloc` was just what came to mind as an example of why in C++ all pointer-to-object types are morally (albeit not explicitly in the standard) `char*` wearing a false nose and with alignment restrictions for some types. And so there has to be a really good reason for an implementer to actually want to do anything other than use the same representation for them all. – Steve Jessop Apr 26 '22 at 00:12
  • @SteveJessop Are freestanding implementations required to support placement new in the first place? AFAIK it's standards-compliant to always throw std::bad_alloc or something if you try to placement new a double into a char array. – Chris Apr 26 '22 at 01:15
  • I don't know any examples offhand but it wouldn't surprise me to learn that a system exists that does everything floating point in a separate part of the memory. – Chris Apr 26 '22 at 01:16
  • Although I agree "often" is probably a stretch regardless. – Chris Apr 26 '22 at 01:22
  • @Chris: hmm, I thought it was required to support placement new, but I could be wrong and I don't feel strongly enough to go looking for it. Out of interest, if doubles are stored in special FPU RAM, not regular RAM, what happens with classes that contain double data members? What if doubles are in one region of special FPU RAM, floats are in a different region of special FPU RAM, and a class contains both? It's like watching a horror movie - I don't want to believe it can happen, but it if does I sure want to see the gory details. – Steve Jessop Apr 26 '22 at 01:51
  • What I'd expect to happen, btw, is that you can store doubles/floats anywhere, but when you actually perform any arithmetic on them they're copied into FPU, operated on, and copied back -- much as `int` arithmetic might "really" be done in registers, not via opcodes that address RAM. Then, if the implementation wants to additionally for efficiency provide a means to directly reference FPU memory via some implementation-specific stuff, maybe including a special pointer type, that's great. But I'm prepared to believe I have an unduly optimistic view of the world. – Steve Jessop Apr 26 '22 at 01:56
  • @SteveJessop I looked it up in the standard. Placement new doesn't do *anything* besides return the pointer you pass it. So `char array[8]; double *d=new (&array[0]) double;` is just the same as `char array[8]; double *d=(double*)array;`. Which I *think* is UB? I'll have to brush up on the C++ memory model to answer your other question. ;) – Chris Apr 26 '22 at 03:23
  • @Chris: well, you could give placement new an initializer. – Steve Jessop Apr 26 '22 at 03:48
15

suppose the standard C++ allows pointers to have different sizes

The size, structure, and format of a pointer is determined by the architecture of the underlying CPU. Language standards don't have the ability to make many demands about these things because it's not something the compiler implementer can control. Instead, language specs focus on how pointers will behave when used in code. The C99 Rationale document (different language, but the reasoning is still valid) makes the following comments in section 6.3.2.3:

C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.

...

Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.

An easy example of this is a pure Harvard architecture computer. Executable instructions and data are stored in separate memory areas, each with separate signal pathways. A Harvard architecture system can use 32-bit pointers for data but only 16-bit pointers to a much smaller instruction memory pool.

The compiler implementer has to ensure that they generate code that both functions correctly on the target platform and behaves according to the rules in the language spec. Sometimes that means that all pointers are the same size, but not always.

The second reason for having all the pointer to be of the same size is that all pointer hold address. And since for a given machine all addresses have the same size

Neither of those statements are necessarily true. They're true on most common architectures in use today, but they don't have to be.

As an example, so-called "segmented" memory architectures can have multiple ways to format an assembly operation. References within the current memory "segment" can use a short "offset" value, whereas references to memory outside the current segment require two values: a segment ID plus an offset. In DOS on x86 these were called "near" and "far" pointers, respectively, and were 16 and 32 bits wide.

I've also seen some specialized chips (like DSPs) that used two bytes of memory to store a 12-bit pointer. The remaining four bits were flags that controlled the way memory was accessed (cached vs. uncached, etc.) The pointer contained the memory address, but it was more than just that.

What a language spec does with all of this is to define a set of rules defining how you can and cannot use pointers in your code, as well as what behavior should be observable for each pointer-related operation. As long as you stick to those rules, your program should behave according to the spec's description. It's the compiler writer's job to figure out how to bridge the gap between the two and generate the correct code without you having to know anything about the CPU architecture's quirks. Going outside the spec and invoking unspecified behavior will make those implementation details become relevant and you're no longer guaranteed as to what will happen. I recommend enabling the compiler warning for conversions that result in a loss of data, and then treating that warning as a hard error.

bta
  • 43,959
  • 6
  • 69
  • 99
  • 4
    This is good commentary about the general problem, but I think ultimately answers neither of the OP's questions, which are specifically about C++ and the C++ standard. – Spike0xff Apr 15 '22 at 20:40
  • 1
    " Language standards don't have the ability to make many demands about these things because it's not something the compiler implementer can control" I think this is not quite right, the standard can demand this. Though if it did demand a common size for all pointers, the compilers would have to sub-optimally use the largest size all the time. – Fatih BAKIR Apr 16 '22 at 03:32
  • @FatihBAKIR - I suppose they technically *can* demand such things. It's probably more accurate to say doing so would be such a bad idea that few languages would ever do it. It would be far too easy to create a requirement that would be incompatible with a certain CPU architecture, and then you'd never be able to use that language on that CPU. If the language designers want to see widespread use and portable code, they'll keep anything platform-specific out of the spec. Using the largest size doesn't avoid the problem, as pointers can differ in *layout* as well as size. – bta Apr 18 '22 at 18:59
  • Another fun example of special bits in pointers: in ARM/thumb interworking, the least significant bit of a pointer-to-function tells the CPU whether to enter the function in ARM mode or thumb mode (meaning: there are two different sets of opcodes, and it flips between them on the fly). The actual code commences at the same physical address either way, with the lsb "rounded down", as you can see by dumping that address in the debugger. But since functions are at least 2-aligned, there's a spare bit available. – Steve Jessop Apr 26 '22 at 02:09
  • "few languages would ever do it" - low-level languages, anyway. Java is perfectly happy to mandate, for example, that the integer types must be particular exact sizes. If that makes Java somewhat less efficient than it could be on 9-bit architectures then Sun was willing to make the sacrifice ;-) – Steve Jessop Apr 26 '22 at 02:14
10

Your reasoning in the first case is half-correct. void* must be able to hold any int* value. But the reverse is not true. Hence, it's quite possible for void* to be bigger than int*.

The statement als gets more complex if you include other pointer types, such as pointers to functions and pointers to methods.

One of the reasons considered by the C++ Standards committee are DSP chips, where the hardware word size is 16 bits, but char is implemented as a half-word. This means char* and void* need one extra bit compared to short* and int*.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 1
    Re: extra space in `char*` for offset-within-word on a word-addressable machine: C++11's thread-aware memory model [basically forbids that](https://stackoverflow.com/questions/19903338/c-memory-model-and-race-conditions-on-char-arrays); a `char` assignment can't be a non-atomic RMW of the containing word; that would break the case of another thread writing an adjacent array element. So `char` needs to be big enough for the machine to directly address it, e.g. CHAR_BIT = 16. Or use an atomic RMW, but that gets very expensive. – Peter Cordes Apr 15 '22 at 02:33
  • 3
    A C++ implementation that don't support threads or async signal / interrupt handlers could still do that. But historically yes, larger `char*` used to be a possible thing. – Peter Cordes Apr 15 '22 at 02:34
  • @PeterCordes: I wish the Standard would recognize that certain features and guarantees should be supported when practical on an implementation's target platform, but viewed as optional on platforms where they could not be supported without degrading the performance of *programs that don't need them*. If a program for a platform with 32-bit addressed storage would need to densely store 8-bit data, and would need atomic 32-bit loads and stores, wouldn't need "independent" 8/16-bit loads and stores, having an implementation use ordinary C constructs for the supported features would be better... – supercat Apr 15 '22 at 20:16
  • ...than requiring that implementations do whatever is necessary to make loads and stores of different `char`-sized objects within machine words behave independently. – supercat Apr 15 '22 at 20:17
  • @supercat You're saying you think `CHAR_BIT = 8` on such a system would still make sense, with the compiler emulating char accesses for you, as an option that would make it not thread/signal safe? Instead of the simple way to implement C++11 on such a system, `CHAR_BIT = 32` which would leave programs to manually pack 8-bit data into words. – Peter Cordes Apr 16 '22 at 00:33
  • 2
    @PeterCordes: A compiler configuration that made CHAR_BIT be 8 and emulated accesses would be able to accommodate a different set of programs from one where CHAR_BIT is 32. Each approach would be more useful than the other for some applications. – supercat Apr 16 '22 at 04:49
  • In practice different compiler options yield a different "C++ implementation" (as far as the standard is concerned). So, as long as both are valid C++ implementations they can have anything or nothing in common otherwise. And if you want your C++ compiler to offer combinations of options that result in a non-conforming C++ implementation, then you're perfectly entitled to document that this is what they do. So in this case I guess you'd have --8bit, --32bit for the different choices of `CHAR_BIT`, and `--8bit -funsafe` for the non-conforming mode where char access is not async-atomic. – Steve Jessop Apr 16 '22 at 15:03
  • It's completely irrelevant whether or not the C++ standard blesses this mode, just as it's completely irrelevant whether or not the IEEE floating-point standard blesses gcc's `-ffast-math`. I suppose it would save some implementation-specific documentation if the C++ standard provided a shorthand to state whether or not you have implemented this hypothetically-optional "atomic char access" feature of C++, and what the implications are if you haven't. – Steve Jessop Apr 16 '22 at 15:05
3

In addition to the requirements of the C++ standard, any implementation that supports the UNIX dlsym() library call must be able to convert a function pointer to a void*. All function pointers must also be the same size.

There have been architectures in the real world where different kinds of pointers have different sizes. One formerly very mainstream example was MS-DOS, where the Compact and Medium memory models could make code pointers larger than data pointers or vice versa. In segmented memory, it was also possible to have object pointers that were different sizes (such as near and far pointers). Finally, some old mainframes had complex pointers that could be different sizes for different types of objects, and fat pointers are even making a comeback on ARM64.

Davislor
  • 14,674
  • 2
  • 34
  • 49
2

As an embedded programmer, I wonder whether even these C languages have taken us too far from the machine! :)

The father, "C", was used to design systems (low-level). Part of the reason different pointer variables need not be the same size is that they can refer to physically different system memories. That is, different data at different memory addresses can actually be located on separate electronic integrated circuits (IC)! For example, constant data might be located on one non-volatile IC, volatile variables on another IC, etc. A memory IC might be designed to be accessed 1 byte at a time, or 4 bytes at a time, etc. (what "pointer++" does).

What if the particular memory bus/address space is only a byte wide? (I've worked with those before.) Then pointer==0xFFFFFFFFFFFFFFFF would be wasteful and perhaps unsafe.

kackle123
  • 261
  • 2
  • 8
1

I’ve seen actual code for a DSP that addressed 16 bit units. So if you took a pointer to int, interpreted the bits as an integer, and increased that by one, the pointer would point to the next 16 bit int.

On this system, char was also 16 bits. If char had been 8 bits, then a char* would have been an int pointer with at least one additional bit.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • 1
    [There are many other old architectures that use word-addressable memory](https://stackoverflow.com/a/6986260/995714) so `char*` would need more significant bits than `int*`. Nowadays almost only DSPs have that feature, because they typically don't operate on bytes but data samples – phuclv Apr 16 '22 at 12:18
-6

Practically, you’ll find that all pointers within one system are same size, for nearly all modern systems; with ‘modern’ starting at 2000.
The permission to be different size comes from older systems using chips like 8086, 80386, etc, where there were ‘near’ and ‘far’ pointers, of obviously different sizes. It was the compiler’s (and sometimes the developer’s) job to sort out - and remember! - what goes in a near pointer and what goes in a far pointer.

C++ needs to stay compatible with those times and environments.

Aganju
  • 6,295
  • 1
  • 12
  • 23
  • 5
    "all pointers within one system are same size, for nearly all modern systems" is incorrect. It is common in 2022 to find systems where function pointers are wider than `void *`. – chux - Reinstate Monica Apr 15 '22 at 01:32
  • 1
    The C++ standard does not allow that, as any pointer needs to be convertible to void* (and back). You probably talk about pointer to method, which is a completely different animal; it needs to store the object and the function pointer to be callable, and is not convertible to void* (at least not back). – Aganju Apr 15 '22 at 01:34
  • 7
    As discussed [in comments on other answers](https://stackoverflow.com/questions/71870205/do-all-pointers-have-the-same-size-in-c#comment127002469_71870413), *function* pointers are not guaranteed to be convertible to/from `void*`. Only pointers to objects have that guarantee across all conforming C++ implementations. You are correct that modern mainstream systems do normally have all pointer types the same size, though, so it's a useful simplification for students to say that, to help them understand the basic concept. – Peter Cordes Apr 15 '22 at 02:38
  • This is clearly wrong, since a large amount of 8 and 16 bit microcontrollers from various manufacturers, released after year 2000, support "banking/paging" extended addresses beyond their default 64kib. – Lundin Oct 04 '22 at 14:44
-6

In modern C++, there are smart pointers in the standard library, std::unique_ptr, and std::shared_ptr. The unique pointer can be the same size of regular pointers when they do not have a deleter function stored with them. A shared pointer may be larger, since it could still store the pointer, but also a pointer to a control block maintaining the reference counts and deleter for the object. This control block could potentially be stored with the allocated object (using std::make_shared), so it may make the reference counted object slightly bigger.

See this interesting question: Why is the size of make_shared two pointers?

Juan
  • 3,667
  • 3
  • 28
  • 32