17

Is it safe (in theory or in practice) to reinterpret_cast a std::pair<T1, T2> const & into a std::pair<T1 const, T2> const &, assuming that the programmer hasn't intentionally done something weird like specializing std::pair<T1 const, T2>?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 5
    I love the two blank deleted answers from hi-rep C++ers, haha. – GManNickG Jan 11 '13 at 05:32
  • 6
    I hate `std::map` too ;) – Potatoswatter Jan 11 '13 at 05:33
  • @GManNickG: I deleted my answer because the question doesn't have enough context, as to why he needs this in the first place. What he is trying to accomplish. Without knowing the purpose, my answer was just a guess, therefore I deleted it. – Nawaz Jan 11 '13 at 05:33
  • 2
    Though I have to ask (agreeing with Nawaz): what for? – GManNickG Jan 11 '13 at 05:34
  • 3
    @GManNickG Because `std::map`, as a container, doesn't let you dictate what `value_type` is. It must be `std::pair< key const, mapped >`. Used properly, it intrudes into your program design and violates separation of concerns. (Therefore, I suspect the most practical answer would explain how to adapt `std::set` or whatever.) But the issue is real enough, and common enough in a design already "locked into" `std::map`, that this isn't an XY problem. – Potatoswatter Jan 11 '13 at 05:36
  • My "guess" with only cursory research: I doubt this is technically valid since the types are nominally incompatible. That said, the layout of `std::pair` is specified by the standard and that layout would not be adversely affected in practice were you to do this. I would love to provide a full answer to this but (a) it's late, and (b) I'm not sure that I can. Will revisit. – Lightness Races in Orbit Jan 11 '13 at 05:44
  • 1
    @Potatoswatter: Hahhaha you read my mind xD – user541686 Jan 11 '13 at 05:45
  • 1
    @LightnessRacesinOrbit It's definitely an issue of layout, and even `int` and `int const` are not *layout-compatible.* (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#1334) – Potatoswatter Jan 11 '13 at 05:45
  • For the record: http://stackoverflow.com/a/3639069/560648 – Lightness Races in Orbit Jan 11 '13 at 05:45
  • 1
    @Potatoswatter: Oh, they're not? Feck it then, this is just plain wrong to do. :) (That looks like the answer to me, then.) – Lightness Races in Orbit Jan 11 '13 at 05:45
  • For those wondering why: I'm trying to insert into an `std::map` whose keys are expensive to copy (they're sets) and I'm trying to stay C++03 compatible... and I want to only calculate the position in the tree once, so that only leaves one overload of `insert` which I can use. I'm doing this with some `swap` trickery that in the end requires this piece of code to work (if you try it you'll see what I mean). I do realize there's other ways to get around this (use `set`s of custom types, etc.) but since this is easy and looks safe at first glance I'm wondering if this hack is good enough. – user541686 Jan 11 '13 at 05:47
  • @Mehrdad If you can use TR1, try `reference_wrapper`. It's a class which contains a pointer and implicitly converts to reference type. It might work, but it tends to make things more fragile. – Potatoswatter Jan 11 '13 at 05:49
  • @Potatoswatter: If I use TR1 I might as well use C++11 :) I'm trying to stick with C++03 so that some people I might give my code to later who I know use older versions of GCC can use it. Although I'll take a look at that class, maybe it only needs C++03 features... (does it need move semantics?) – user541686 Jan 11 '13 at 05:50
  • @Mehrdad then you can try a class which contains a pointer and dereferences it by implicit conversion ;v) there's nothing magic there. https://ideone.com/ZhyyS1 – Potatoswatter Jan 11 '13 at 05:57
  • @Potatoswatter: Oh, but the thing is, I don't want to change the type of the container (I'd have to go through the entire code and change quite a lot of things)... that's why I didn't want to switch to `set`s either. And it gets pretty ugly since I can't say `iterator->first.blah` or `pair.first.blah` now, which I'm doing all over the place in my code. – user541686 Jan 11 '13 at 05:58
  • @Mehrdad: Which insert overload do you use, that won't copy the argument anyway? Care to give us a bit more code and where you think the problem is with a canonical plain insertion? – Arne Mertz Jan 11 '13 at 08:35
  • @ArneMertz: It isn't the exact code, but it's similar to what I'm doing... see [here](http://liveworkspace.org/code/iP8LC$4). The point is that making the pair shouldn't copy the data inside the container (a vector in this example). Notice that if I can do what I'm asking here, then the buffer gets re-used, so there's only ever 1 copy made, and that's the one inside the tree -- no extra temporary copies anywhere else. – user541686 Jan 11 '13 at 08:53
  • Seems you can't just reinterpret_cast safely. Maybe it works if you declare the `pair` and const_cast the vector for the swapping, so no reinterpret_cast is needed for the insertion. Alternatively you could rely on optimizations: `m.insert(m.end(), make_pair(v, v.size()));` - the temporary pair should be optimized away: [see here](http://liveworkspace.org/code/4m1j0F$7) – Arne Mertz Jan 11 '13 at 09:15
  • @ArneMertz: Isn't `const_cast` even worse? Maybe I don't understand but you can't cast away const-ness for a const object; you can only cast away const-ness for a const reference to a mutable object. Regarding the temporary being optimized away: it might be *moved* in C++11, but as far as I understand it can't be optimized away in C++03. What makes you think it can be optimized away? (Also, if you're using `make_pair` with explicit type arguments like that then you should just use a temporary `pair` instead.) – user541686 Jan 11 '13 at 09:23
  • If the optimizer sees that the temporary just gets created, copy-constructed into the map value and destroyed, it can rearrange the assembler code to omit the temporary alltogether. However, I am not sure if it can detect that sequence, if the temporary is not trivial to construct/copy/destroy – Arne Mertz Jan 11 '13 at 09:35
  • @ArneMertz: I guess in theory it could work, but I haven't seen that intense of an optimization before. :\ If you have an example where that happens I'd love to see it. – user541686 Jan 11 '13 at 09:39
  • 2
    You could just try constructing one from the other `std::pair q(p.first, p.second)` and see if the compiler eliminates copying. Or use a smart pointer. – Maxim Egorushkin Jan 11 '13 at 09:49
  • 1
    @MaximYegorushkin: You're *way* too hopeful. :-) Even the allocations in `for (;;) { std::vector(1000000); }` doesn't get optimized away on my C++03 compiler (Visual C++ 2008)... – user541686 Jan 11 '13 at 10:01
  • possible duplicate of http://stackoverflow.com/q/14227983/726361 which is followed by [template metafunction for detecting template specialisations](http://stackoverflow.com/questions/14244082/template-metafunction-for-detecting-template-specialisations) – Seth Carnegie Jan 11 '13 at 19:58

3 Answers3

8

It's NOT portable to do so.

std::pair requirements are laid out in clause 20.3. Clause 17.5.2.3 clarifies that

Clauses 18 through 30 and Annex D do not specify the representation of classes, and intentionally omit specification of class members. An implementation may define static or non-static class members, or both, as needed to implement the semantics of the member functions specified in Clauses 18 through 30 and Annex D.

This implies that it's legal (although incredibly unlikely) for an implementation to include a partial specialization such as:

template<typename T1, typename T2>
struct pair<T1, T2>
{
    T1 first;
    T2 second;
};

template<typename T1, typename T2>
struct pair<const T1, T2>
{
    T2 second;
    const T1 first;
};

which are clearly not layout-compatible. Other variations including inclusion of additional non-static data members possibly before first and/or second are also allowed under the rule.


Now, it is somewhat interesting to consider the case where the layout is known. Although Potatoswatter pointed out DR1334 which asserts that T and const T are not layout-compatible, the Standard provides enough guarantees to allow us to get most of the way anyway:

template<typename T1, typename T2>
struct mypair<T1, T2>
{
    T1 first;
    T2 second;
};

mypair<int, double> pair1;
mypair<int, double>* p1 = &pair1;
int* p2 = reinterpret_cast<int*>(p1); // legal by 9.2p20
const int* p3 = p2;
mypair<const int, double>* p4 = reinterpret_cast<mypair<const int, double>*>(p3); // again 9.2p20

However this doesn't work on std::pair as we can't apply 9.2p20 without knowing that first is actually the initial member, which is not specified.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Iiiiinteresting... so is that the only barrier? If I verify on my implementation that the order isn't changed (and the compiler isn't changing it internally, etc.) then am I good to go? – user541686 Jan 11 '13 at 19:42
  • @Mehrdad: Well, 9.2p20 only applies to the initial member... but I think if you used 9.2p19 in conjunction, you've ruled out any funny business that would cause it to break. It may be that formally you'd need a chain of `reinterpret_cast` applications as shown above, adding `const` to each member in reverse order (so that each application leaves a shorter common initial subsequence). – Ben Voigt Jan 11 '13 at 19:47
  • 1
    But `first` and `second` are not private members (per the title of section 17.5.2.3), and they are not marked `// exposition only`. That indicates to me that `pair` is a special case. – ecatmur Jan 11 '13 at 19:58
  • @ecatmur: If the rule were meant only to apply to members with private visibility, it would say that **in the rule**. It does not. However, even if you restrict it that way, allowing the insertion of private non-static members into `std::pair` would change the layout, and could do so differently in different specializations. I think it's best though to believe the rule says what it means: That the class definitions in clause 20 do not constrain the layout, they are only a list of members required to exist. – Ben Voigt Jan 11 '13 at 20:19
  • **−1** E.g. C++11 §26.4 very specifically specifies the layout of a `complex`, **contradicting** the assumption in this answer that §17.5.2.3 is an absolute thing. So that assumption of absolute implementation freedom provably does not hold. With that in mind, §20.3.2 directly specifies the layout (modulo padding after first member) of a `std::pair`, not marked as exposition. – Cheers and hth. - Alf Aug 09 '16 at 06:07
  • @Alf: You've found the exception that very much proves the rule. The fact that the semantics surrounding treatment of `std::complex` as an array of `T[2]` **have to be explicitly stated** strongly supports the interpretation that 17.5.2.3 applies to each and every class definition listed in clauses 18 through 30. – Ben Voigt Aug 09 '16 at 13:02
  • @Raeynd: Well, then you would have an initial common subsequence, and you could use it where the Standard allows access to an initial common subsequence. So basically, inside a `union` yes, but `reinterpret_cast` no. – Ben Voigt Sep 16 '16 at 15:39
  • @Raeynd The narrow character types exception to the strict aliasing rule only operates in one direction. – Ben Voigt Sep 16 '16 at 17:36
  • @Raeynd: The example you linked returns a pointer that matches exactly the dynamic type of the object that exists there, so strict aliasing requirements are met. You are making a pointer to a different class entirely. If the type is layout-compatible, then the target type of the pointer you use for access is a cv-qualified version of the actual dynamic type of the subobject at that location, making aliasing ok. But then why did you talk about the `unsigned char` buffer? `unsigned char` is not relevant to the case you discuss. – Ben Voigt Sep 16 '16 at 19:29
  • Sorry I was referring to the unsigned char array of std::aligned_storage. Basically I wanted to do something like [this](http://pastebin.com/CgYf8eQy). – Raeynd Sep 16 '16 at 20:04
6

pair is defined in section 20.3.2 of the standard to have data members:

template <class T1, class T2>
struct pair {
    T1 first;
    T2 second;
};

This means that for concrete types T1, T2, pair<T1, T2> and pair<const T1, T2> are guaranteed to have respective data members:

struct pair<T1, T2> {
    T1 first;
    T2 second;
};
struct pair<const T1, T2> {
    const T1 first;
    T2 second;
};

Now, if T1 and T2 are both standard-layout, then pair<T1, T2> and pair<const T1, T2> are both standard-layout. As discussed above, by DR1334 they are not layout-compatible (3.9p11), but by 9.2p19 they can be reinterpret_cast to their respective T1 or const T1 first member. By 9.2p13 the T2 second member must be located after the first member (i.e. with higher address) and by 1.8p5 must be located immediately after the first member such that the object is contiguous after accounting for alignment (9.2p19).

We can check this using offsetof (which is defined for standard-layout types):

static_assert(offsetof(pair<T1, T2>, second) ==
    offsetof(pair<const T1, T2>, second), "!");

Since pair<T1, T2> and pair<const T1, T2> have the same layout, casting in the forward direction and using the result to access the members is valid by 3.9.2p3:

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

So the reinterpret_cast is safe only if std::is_standard_layout<std::pair<T1, T2>>::value is true.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 2
    Did you see Potatoswatter's comment and linked defect report on the question, asserting that `T1` and `const T1` are in fact not layout-compatible? I'm not sure I buy that argument, but you need some evidence to support your claim that they are layout-compatible. – Ben Voigt Jan 11 '13 at 19:17
  • @BenVoigt ah, missed that. I think we can manage without them being layout-compatible, though. – ecatmur Jan 11 '13 at 19:59
  • I don't think 1.8p5 strictly requires `std::pair` and `std::pair` to have equal size, equal padding, equal `offsetof`, etc. The two types could have different amounts of padding and still each "occupy contiguous bytes of storage". – aschepler Jan 11 '13 at 20:07
  • @aschepler: Not if they have the same members and are *standard-layout*. But I think they don't necessarily have the same members, per 17.5.2.3 – Ben Voigt Jan 11 '13 at 22:16
0

The practical answer is that casting to const should be safe since you are reinterpret-casting to an object with an identical representation. However, the other way around introduces undefined behaviour (const to non-const).

As for the "theoretical" answer, I should note that the C++ standard does not guarantee an identical bitwise representation of const/non-const objects. The const keyword guarantees "conceptual constness", which is implementation dependant.

  • Of course it guarantees identical representation of `const` and non-`const` objects. `const T&` can bind to either one. But this doesn't preclude a specialization of `std::pair` with different layout when one or both of the template type parameters are marked `const`. Perhaps there's other language that does? – Ben Voigt Jan 11 '13 at 18:33
  • The pair is not specialized, as mentioned in the question. –  Jan 12 '13 at 05:54
  • That's not what the question says. All it says is that the *programmer* hasn't defined a specialization. The standard library implementer may have. – Ben Voigt Jan 12 '13 at 08:59