29

I'm quite new at working with C++ and haven't grasped all the intricacies and subtleties of the language.

What is the most portable, correct and safe way to add an arbitrary byte offset to a pointer of any type in C++11?

SomeType* ptr;
int offset = 12345 /* bytes */;
ptr = ptr + offset;             // <--

I found many answers on Stack Overflow and Google, but they all propose different things. Some variants I have encountered:

  1. Cast to char *:

    ptr = (SomeType*)(((char*)ptr) + offset);
    
  2. Cast to unsigned int:

    ptr = (SomeType*)((unsigned int)ptr) + offset);
    
  3. Cast to size_t:

    ptr = (SomeType*)((size_t)ptr) + offset);
    
  4. "The size of size_t and ptrdiff_t always coincide with the pointer's size. Because of this, it is these types that should be used as indexes for large arrays, for storage of pointers and pointer arithmetic." - About size_t and ptrdiff_t on CodeProject

    ptr = (SomeType*)((size_t)ptr + (ptrdiff_t)offset);
    
  5. Or like the previous, but with intptr_t instead of size_t, which is signed instead of unsigned:

    ptr = (SomeType*)((intptr_t)ptr + (ptrdiff_t)offset);
    
  6. Only cast to intptr_t, since offset is already a signed integer and intptr_t is not size_t:

    ptr = (SomeType*)((intptr_t)ptr) + offset);
    

And in all these cases, is it safe to use old C-style casts, or is it safer or more portable to use static_cast or reinterpret_cast for this?

Should I assume the pointer value itself is unsigned or signed?

Community
  • 1
  • 1
Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
  • 7
    There isn't any. It's undefined behaviour to add an arbitrary byte offset to a pointer. You can only do arithmetic on pointers that point to the same array (and one past the end of it). – jrok Apr 10 '13 at 18:57
  • It is better to not use C-style cast (one reason is that you accidently can cast away const-ness. – Zyx 2000 Apr 10 '13 at 18:57
  • Even if it is undefined behavior according to the spec, what is then the most safe and portable way (across common compilers) to add a byte offset to a pointer? – Daniel A.A. Pelsmaeker Apr 10 '13 at 18:59
  • 5
    @jrok It's perfectly well defined to add an arbitrary offset to a pointer. What's undefined is dereferencing a pointer that doesn't point to valid memory. – sfstewman Apr 10 '13 at 18:59
  • 1
    @sfstewman It won't cause errors on the implementations I know, but IIRC there's a clause prohibiting going more than one object past the end of an array (i.e. `int a[5]; a + 5;` is good, `int a[5]; a + 6` is bad). Edit: found a source: http://stackoverflow.com/a/988220/395760 –  Apr 10 '13 at 19:07
  • 6
    @sfstewman: C++ draft n3092 5.7 5: “If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.” – Eric Postpischil Apr 10 '13 at 19:12
  • 2
    @sfstewman You're wrong. The standard explicitly makes it UB (see the comment above). In practice, yeah, it just works, at least until you smash your own stack or something like that. – jrok Apr 10 '13 at 19:14
  • There's a difference between adding a completely arbitrary offset and adding a *bounded* but otherwise arbitrary offset. I think it's reasonable to answer this question as the latter and not to just dismiss it as "You can't do this; it's UB" from interpreting it as the former. (If you're going to take the question completely literally, you could complain about UB from reading `ptr` before it's initialized...) – jamesdlin Apr 10 '13 at 19:22
  • @Virtlink: What is actual problem you are trying to solve? You should almost never work with byte offsets into normal C or C++ objects (except, of course, for arrays of character type). If you are trying to see the underlying representation of an object, there are ways to do it with `memcpy` that are standard. – Eric Postpischil Apr 10 '13 at 19:41
  • @EricPostpischil I'm writing an x86 OS kernel and that involves _a lot_ of byte offsets. If I ever move to x86-64 systems, or ARM, or whatever, it will involve a lot of changes. Therefore, trivial things such as adding byte offsets have to be done right the first time, so I don't have to fix that _too_. – Daniel A.A. Pelsmaeker Apr 10 '13 at 19:45
  • 1
    *"char is not guaranteed to be 8 bits"* - But it is at least guaranteed to be the same size like what C++ thinks a *byte* is (and in which units sizes are measured in C++). – Christian Rau Apr 10 '13 at 19:45
  • 1
    @Virtlink: (0) Do not use byte offsets, even within kernel code, if there is any way possible to avoid them. You can define structures within C++ to access all sorts of things, even at the hardware level, such as page table entries or exception vectors. (1) If you must use byte offsets, write routines or macros, such as `PointerAdd(pointer, offset)`, that do the arithmetic. Then you can just update those routines when you change targets. (2) You do not need to use as many byte offsets as you think. (3) If you think you do, show an example, perhaps in a new question. – Eric Postpischil Apr 10 '13 at 19:50
  • 1
    *"Should I assume the pointer value itself is unsigned or signed?"* - You shouldn't even assume it's an integer. In the end all your solutions converting to integer for doing arithmetic are UB. The only thing you can do with a pointer cast to int is cast it back to a pointer. But casting to int, adding something and casting back is definitely UB. But I have good hope somebody will come up with a very good standard-proved answer for all your possibilities. – Christian Rau Apr 10 '13 at 19:50
  • @Virtlink: Adding `12345` to an `int*` will break on ARM, regardless how you do it. ARM, unlike x86 enforces alignment, and `12345` is not a multiple of 4. (Assuming you stick with the ARM ABI which has `sizeof(int)==4` ) – MSalters Apr 11 '13 at 08:19
  • @MSalters It will work flawlessly. I only get in trouble when I try to dereference the pointer. – Daniel A.A. Pelsmaeker Apr 11 '13 at 09:22
  • 1
    **C++11 standard 5.7.7 (footnote: 82)** "Another way to approach pointer arithmetic is first to convert the pointer(s) to character pointer(s): In this scheme the integral value of the expression added to or subtracted from the converted pointer is first multiplied by the size of the object originally pointed to, and the resulting pointer is converted back to the original type. ... When viewed in this way, an implementation need only provide one extra byte ... just after the end of the object in order to satisfy the “one past the last element” requirements." – Galik Jul 23 '14 at 22:03
  • @EricPostpischil: One problem I'd like to solve when using certain processors is taking a pointer to an object in RAM, and turning it into a pointer to an area of address space which accesses that area of RAM differently. For example, on some processors one may configure things so that address space 0x10000000-0x1000FFFF provides normal access to RAM and 0x20000000-0x2000FFFF will provide read-only access. Or different address spaces may have different numbers of wait states, etc. – supercat Jun 26 '15 at 22:13
  • @EricPostpischil: Obviously the C standard knows nothing about what different address ranges do, but it would be useful to have a means of taking a pointer to one kind of address space and turn it into a pointer in another part of the address space without having to know the size of the thing identified by the pointer. Unfortunately, I've found no clean way to do that. – supercat Jun 26 '15 at 22:16

5 Answers5

16

I would use something like:

unsigned char* bytePtr = reinterpret_cast<unsigned char*>(ptr);
bytePtr += offset;
freddy.smith
  • 451
  • 4
  • 9
  • 1
    I would use the more condensed `(reinterpret_cast(ptr) + offset)` perhaps wrapped in an inline (possibly template) function, depending on how often I needed it and what the returned type ought to be. – Nik Bougalis Apr 10 '13 at 19:08
  • Please elaborate: why? For example, why not use `intptr_t`, `uintptr_t`, `size_t`, or `ptrdiff_t`? They have been added for a reason, I reckon. And why unsigned, not signed like in the question I linked at #1? – Daniel A.A. Pelsmaeker Apr 10 '13 at 19:09
  • `intptr_t` is an *integer* type guaranteed to be large enough so a pointer fits. Adding `offset` to such a type would translate to adding `(sizeof(intptr_t) * offset)`. As for why I would use `unsigned char` - it's just personal preference. Using `char` shouldn't *really* make any difference for *this* purpose. – Nik Bougalis Apr 10 '13 at 19:22
  • If the sole reason you are wanting to cast to a byte pointer is to add an offset, then it doesn't matter if you use signed or unsigned, both are a 8 bits long. Most pointer types themselves are much larger than a byte, a size_t is normally the same size as an int. – freddy.smith Apr 10 '13 at 19:23
  • I am not sure this is legal in current C++. C++ generally prohibits conversions between pointers to different types of objects. C makes an exception for conversion to pointers to character types. However, the C++ standard only says, as far as I can find, that you can copy the bytes of an object (as by using `memcpy`). Can somebody demonstrate that this is defined by the C++ standard? – Eric Postpischil Apr 10 '13 at 19:24
  • That's what reinterpret_cast is for! – freddy.smith Apr 10 '13 at 19:29
  • @NikBougalis That is only true if you cast to `intptr_t*`, if I'm correct? I proposed casting to `intptr_t` since it is, as you say, guaranteed to be large enough to hold the pointer. And `char` is [not guaranteed to be 8-bits](http://stackoverflow.com/a/4266820/146622) so it doesn't feel very portable. – Daniel A.A. Pelsmaeker Apr 10 '13 at 19:29
  • 1
    @Virtlink: `unsigned char` is preferred for working with bytes because the language standard requires it be a simple binary representation of the value and that all bit patterns correspond to a value. In contrast, `char` and `signed char` might use two’s complement, one’s complement, or signed magnitude and might have bit patterns that do not correspond to a value. – Eric Postpischil Apr 10 '13 at 19:29
  • @alan.dugdall: Can you demonstrate it with language in the C++ standard? – Eric Postpischil Apr 10 '13 at 19:30
  • class A {}; class B {}; A* a = new A; B* b = reinterpret_cast(a); – freddy.smith Apr 10 '13 at 19:33
  • @EricPostpischil `unsigned char` isn't necessarily 8 bits, right? So there goes portability. – Daniel A.A. Pelsmaeker Apr 10 '13 at 19:35
  • @alan.dugdall: Can you demonstrate it with language quoted from the C++ standard? – Eric Postpischil Apr 10 '13 at 19:35
  • 2
    @Virtlink: For the purposes of C and C++, an `unsigned char` is a byte. The allowances in the standard for the number of bits in a `char` to vary are for old or esoteric platforms where the memory is organized in something like 9-bit units, not so that a C or C++ implementation can give you 16-bit `char` objects while addressing uses 8-bit units. – Eric Postpischil Apr 10 '13 at 19:40
  • @EricPostpischil, see section 5.2.10, specifically point 4:A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is implementation defined. [ Note: it is intended to be unsurprising to those who know the addressing structure of the underlying machine. — end note] – freddy.smith Apr 10 '13 at 19:49
  • 2
    @Virtlink Yes, a char doesn't need to be 8 bits, but you know what, you don't care. char is gauranteed to be the unit in which C++ measures sizes and thus the granularity of your systems addressing. And mixing code written for an 8-bit platform (and working at that low a level) with code for a 9-bit platform is hopefully something you're not planning to do. – Christian Rau Apr 10 '13 at 19:54
  • @Virtlink: correct - but even if you cast purely to `intptr_t` and then attempt arithmetic on that, it's unclear that the results would be guaranteed correct. What if you are on a NUMA machine, for example? The pointer may not even really be a pointer at all. – Nik Bougalis Apr 10 '13 at 19:55
  • @alan.dugdall: That does not show that converting a pointer to `unsigned char *` is defined by the C++ standard. – Eric Postpischil Apr 10 '13 at 19:57
  • @Virtlink If you're that keen on platform-independence to worry about non-8-bit chars, then casting pointers to ints (be it even `(u)intptr_t`s) and performing weird arithmetics is completely out of the question anyway. – Christian Rau Apr 10 '13 at 19:58
  • @ChristianRau Yeah, I get the comments on non-8-bit characters. So, assuming `unsigned char*` is correct, safe and portable; why are options 3 through 6 not portable or safe? Why is `intptr_t` _completely out of the question_, as it is just an integer type with normal arithmetic rules, right? – Daniel A.A. Pelsmaeker Apr 10 '13 at 20:02
  • 1
    @Virtlink Because the standard doesn't make any guarantees about casting to int, disturbing the int and casting back. The only thing you can do with a pointer cast to int is cast it back. Of course it will most probably work an any practical platform (in the same way any practical platform will have 8-bit chars), but it's really UB to use this pointer afterwards (and if ou don't want to use it, then why adding a offset anyway?). And in the end I don't even think anybody guarantees the pointer to convert into a byte address (again, on most practical platforms it will indeed do). – Christian Rau Apr 10 '13 at 20:04
  • @EricPostpischil The way I read it, it is clearly stating you can convert a pointer to any integral type that is large enough to hold it, i.e. a pointer to another pointer. It might blow up if you try to access it, but you **can** do it. We'll just have to agree to disagree. – freddy.smith Apr 10 '13 at 20:05
  • @alan.dugdall: Pointers are not integral types. – Eric Postpischil Apr 10 '13 at 20:06
  • @EricPostpischil 5.2.10 7) A pointer to an object can be explicitly converted to a pointer to an object of different type.) Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified. – freddy.smith Apr 10 '13 at 20:12
  • @alan.dugdall: Yes, converting `ptr` to pointer to `unsigned char` and then converting back to pointer to `SomeType` is defined. This does not define converting, adding, and converting back, nor does it define converting to pointer to `unsigned char`, adding, and dereferencing. – Eric Postpischil Apr 10 '13 at 20:23
  • @EricPostpischil It does, as converting from 'pointer to T1' to 'pointer T2' is defined in the standard. The converted pointer is just a pointer to a type of T2, you can do whatever you like with it then as far as pointer arithmetic is concerned, it isn't a pointer to a T1 anymore. It might blow up if you get things wrong, but it's in the standard. – freddy.smith Apr 10 '13 at 20:31
  • @alan.dugdall: The passage you cited from the standard says that the **result** of a pointer conversion is unspecified **except** that you can convert it back. It does not tell us that the result of the forward conversion is a pointer to T2 that can be used the normal ways a pointer to T2 can be used. The only operation defined on this new pointer is that you can convert it back. Addition is not defined. – Eric Postpischil Apr 10 '13 at 21:01
13

Using reinterpret_cast (or C-style cast) means circumventing the type system and is not portable and not safe. Whether it is correct, depends on your architecture. If you (must) do it, you insinuate that you know what you do and you are basically on your own from then on. So much for the warning.

If you add a number n to a pointer or type T, you move this pointer by n elements of type T. What you are looking for is a type where 1 element means 1 byte.

From the sizeof section 5.3.3.1.:

The sizeof operator yields the number of bytes in the object representation of its operand. [...] sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined.

Note, that there is no statement about sizeof(int), etc.

Definition of byte (section 1.7.1.):

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. [...] The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

So, if sizeof returns the number of bytes and sizeof(char) is 1, than char has the size of one byte to C++. Therefore, char is logically a byte to C++ but not necessarily the de facto standard 8-bit byte. Adding n to a char* will return a pointer that is n bytes (in terms of the C++ memory model) away. Thus, if you want to play the dangerous game of manipulating an object's pointer bytewise, you should cast it to one of the char variants. If your type also has qualifiers like const, you should transfer them to your "byte type" too.

    template <typename Dst, typename Src>
    struct adopt_const {
        using type = typename std::conditional< std::is_const<Src>::value,
            typename std::add_const<Dst>::type, Dst>::type;
    };

    template <typename Dst, typename Src>
    struct adopt_volatile {
        using type = typename std::conditional< std::is_volatile<Src>::value,
            typename std::add_volatile<Dst>::type, Dst>::type;
    };

    template <typename Dst, typename Src>
    struct adopt_cv {
        using type = typename adopt_const<
            typename adopt_volatile<Dst, Src>::type, Src>::type;
    };

    template <typename T>
    T*  add_offset(T* p, std::ptrdiff_t delta) noexcept {
        using byte_type = typename adopt_cv<unsigned char, T>::type;
        return reinterpret_cast<T*>(reinterpret_cast<byte_type*>(p) + delta);
    }

Example

2

Please note that, NULL is special. Adding an offset on it is dangerous.
reinterpret_cast can't remove const or volatile qualifiers. More portable way is C-style cast.
reinterpret_cast with traits like @user2218982's answer, seems more safer.

template <typename T>
inline void addOffset( std::ptrdiff_t offset, T *&ptr ) { 
    if ( !ptr )
        return;
    ptr = (T*)( (unsigned char*)ptr + offset );
} 
where23
  • 483
  • 3
  • 9
0

Mine isn't as elegant, but I hope is more readable. char helper_ptr; helper_ptr= (char) ptr;

Then you can traverse byte-by-byte using helper_ptr.

ptr = (SomeType*)(((char*)ptr) + 1) will advance the ptr by sizeof(SomeType) instead of 1 byte.

-2

if you have:

myType *ptr;

and you do:

ptr+=3;

The compiler will most certainly increment your variable by:

3*sizeof(myType)

And it's the standard way to do it as far as I know.

If you want to iterate over let's say an array of elements of type myType that's the way to do it.

Ok, if you wanna cast do that using

myNewType *newPtr=reinterpret_cast < myNewType * > ( ptr )

Or stick to plain old C and do:

myNewType *newPtr=(myNewType *) ptr;

And then increment

Mppl
  • 941
  • 10
  • 18
  • 1
    I know how it works when you don't cast. I want to add _any_ byte offset (say, `0xABC` bytes) to a pointer of any type `MyType*` regardless of its type's size. If `MyType* ptr = (MyType*)0x1000` then I want to end up with `ptr == (MyType*)0x1ABC`. – Daniel A.A. Pelsmaeker Apr 10 '13 at 19:02
  • 1
    You should avoid using C-style casts in C++ for anything except perhaps numerical casts. The compiler will apply the first C++ cast that works except for `dynamic_cast`. One issue (among others) is that C-style casts can remove the constness of an object with no indication it's happening either in source or via a compiler diagnostic. – Captain Obvlious Apr 10 '13 at 19:28
  • 1
    How does this answer the question? – JBentley Apr 10 '13 at 19:31