4

I was doing a programming exercise for C++ and I came through this question

what on your system has restriction on pointer types char* , int* and void*? For example, may an int* have an odd value? Hint:alignment

I have nothing to show what I have done, I have trouble understanding the question

IcanCode
  • 509
  • 3
  • 16
  • Related: https://stackoverflow.com/questions/11386946/whats-the-difference-between-sizeof-and-alignof . – janekb04 Nov 02 '21 at 06:37
  • 2
    Pedantically, the question is sort of misworded. There are no such restrictions on pointers types themselves. System-specific restrictions must be met for *certain operations* to be legal on typed pointers. –  Nov 02 '21 at 06:43
  • As Frank noted, the wording is confusing. E.g. alignment is a CPU limitation, not language limitation. Some CPUs, IIRC ARMv5 and earlier, required variables to be naturally aligned, so a 4 bytes int would have to be stored in an address divisible by 4, and int* would have the least two significant bits zero. – Uri Raz Nov 02 '21 at 07:00
  • 1
    The wording is certainly confusing. Given `int A[10];`, both `A+1` and `A+2` are pointers, and logically one is odd. A common _binary_ representation may have the lower bit set to zero in both cases, but C++ does not prescribe a binary representation. It's entirely valid for a CPU to use pointers in the same way that C++ does, in which case the C++ implementation would implement `A+1` indeed as an increment by one. – MSalters Nov 02 '21 at 08:19
  • 1
    Well, strictly speaking, you _can_ have odd `int` pointers on certain platforms (surprisingly common ones). It comes with a performance penalty, but it doesn’t crash the Matrix. – Andrej Podzimek Nov 02 '21 at 08:57
  • @MSalters `A+1` and `A+2` are `sizeof(int)` apart – Caleth Nov 02 '21 at 11:58
  • @Caleth: Indeed, _in bytes_. There's no rule in C++ that says a CPU must implement an `int*` as a byte pointer, though. `A+1` could be represented as `0x1F` and `A+2` as `0x20`. However, on such a platform `(char*)(A+1)` could be `0x3E` - casts do not need to preserve binary representation. That's why you **can** cast `int*` to `char*` but not `int**` to `char**`. – MSalters Nov 02 '21 at 12:13
  • @MSalters or `sizeof(int)` might be 1 – Caleth Nov 02 '21 at 12:15

3 Answers3

3

Objects of a given type can only be stored in memory at addresses that are a multiple of their alignment.

Also, a valid pointer contains the memory address of an object of its type.

By combining these two, we can say that a valid pointer must absolutely contain an address that is a multiple of the alignment of its matching type.

You can ask the compiler to give you the alignment of a type for its current target system by using the alignof() operator. For example:

#include <iostream>

int main() {
  std::cout << "pointers to float must contain a multiple of " << alignof(float) << "\n";
}
2

This is not an answer. It’s just an example showing that the question is way too open-ended. Is it referring to the alignment of data structures outlined by a language standard or perhaps to memory alignment requirements of a particular hardware platform?

Let me share a secret:

#include <cstdint>
#include <ios>
#include <iostream>

using std::uint8_t;
using std::uint32_t;
using std::uint64_t;

int main() {
  const uint64_t something{0x1020304050607080};
  std::cout << std::hex;
  for (const uint32_t shift : {0, 1, 2, 3, 4}) {
    const uint32_t *const pointer =
        reinterpret_cast<const uint32_t*>(
            reinterpret_cast<const uint8_t*>(&something)
            + shift);
    std::cout << pointer << " --> " << *pointer << std::endl;
  }
}

Don’t try this^^^ at home. (Or do, just for fun.) On x86_64 this is no big deal. Possible output:

  • just built and executed on x86_64:

    0x7ffc22fd30f8 --> 50607080
    0x7ffc22fd30f9 --> 40506070
    0x7ffc22fd30fa --> 30405060
    0x7ffc22fd30fb --> 20304050
    0x7ffc22fd30fc --> 10203040
    
  • under valgrind on x86_64:

    0x1fff0007c0 --> 50607080
    0x1fff0007c1 --> 40506070
    0x1fff0007c2 --> 30405060
    0x1fff0007c3 --> 20304050
    0x1fff0007c4 --> 10203040
    
  • just built and executed on RISC-V (rv64g):

    0x3fffed61c8 --> 50607080
    0x3fffed61c9 --> 40506070
    0x3fffed61ca --> 30405060
    0x3fffed61cb --> 20304050
    0x3fffed61cc --> 10203040
    

Do any pointers look odd anywhere? Pun intended.

A hypothetical overly clever compiler could emulate this^^^ behavior on any platform. However, making the shift (e.g.) a user input can easily rule out that case.

In general, misaligned pointers are discouraged, but detailed architecture overviews are hard to find. It would be actually great fun to find a platform (architecture + OS + libraries) where one gets a SIGBUS for this. Sadly enough I don’t have such a system configured and it won’t give me a SIGBUS on my ARM64 phone.

Back to the question:

  • Can an int* pointer be an odd number? Yes, it can.
  • Should an int* pointer be odd (or otherwise misaligned)? No, because
    • misalignment always comes with a performance penalty, from significant to huge (e.g. trap handlers can be involved to emulate the access in the worst case) and
    • the usual atomicity guarantees may not apply (so e.g. RCU-based algorithms using unaligned pointers may be at risk).
Andrej Podzimek
  • 2,409
  • 9
  • 12
1

Reaching the question from the other end. There are address space limitations. For example,

For a 64-bit process on 64-bit Windows, virtual address space is the 128-terabyte range 0x000'00000000 through 0x7FFF'FFFFFFFF

This is not specific to a type, applies equally to int* and char*

And sure the alignment, which is already answered by other answers.


This knowledge on pointer limits is sometimes used to pack extra data into a pointer. This may be useful:

  • To conserve memory, say a red-black tree node saves red/black bit in pointer to parent to avoid extra memory for it
  • For synchronization algorithms to fit into atomic operations size. For example, a pointer that has a counter to mitigate ABA, has its counter in sparse pointer bits.

Though this is kind of cursed knowledge, especially about address space limits. Limits tend to increase.

(Like there was a limit of 2 GB for 32-bit Windows process, but now with 64-bit Windows it is 4 GB; to make sure old processes that can't work with new range work fine, there's /LARGEADDRESSAWARE flag, that defaults to false).

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79