7

It may surprise some coders and, as surprising as it can be, it is not possible to implement std::vector without non-standard support of the compilers. The problem essentially resides on the ability to perform pointer arithmetic on a raw storage region. The paper, p0593: Implicit creation of objects for low-level object manipulation, that appears in @ShafikYaghmour answer, exposes clearly the problematic and proposes modification of the standard in order to make implementation of vector like container and other law level programming technique easier.

Nevertheless I was wondering if there were no work around to implement a type equivalent to std::vector only using what is provided by the language without any use of the standard library.

The objective is to construct vector elements, one by one, in a raw storage region, and be able to access those elements using an iterator. This would be equivalent to a sequence of push_back on a std::vector.

To get an idea of the problem, bellow a simplification of the operations that are performed on the implementation of std::vector in the libc++ or libstdc++:

void access_value(std::string x);

std::string s1, s2, s3;
//allocation
auto p=static_cast<std::string*>(::operator new(10*sizeof(std::string)));

//push_back s1
new(p) std::string(s1);
access_value(*p);//undefined behavior, p is not a pointer to object

//push_back s2
new(p+1) std::string(s2);//undefined behavior
        //, pointer arithmetic but no array (neither implicit array of size 1)
access_value(*(p+1));//undefined behavior, p+1 is not a pointer to object

//push_back s2
new(p+2) std::string(s3);//undefined behavior
        //, pointer arithmetic but no array
access_value(*(p+2));//undefined behavior, p+2 is not a pointer to object

My idea is to use a union that never initialize its member.

//almost trivialy default constructible
template<class T>
union atdc{
  char _c;
  T value;
  atdc ()noexcept{ }
  ~atdc(){}
};

The raw storage will be initialized with an array of this union type, and pointer arithmetic is always performed on this array. Then elements are constructed on the inactive member of the union at each push_back.

std::string s1, s2, s3;
auto p=::operator new(10*sizeof(std::string));
auto arr = new(p) atdc<std::string>[10];
//pointer arithmetic on arr is allowed

//push_back s1
new(&arr[0].value) std::string(s1); //union member activation
access_value(arr[0].value);

//push_back s2
new(&arr[1].value) std::string(s2);
access_value(arr[1].value);

//push_back s2
new(&arr[2].value) std::string(s2);
access_value(arr[2].value);

Is there any undefined behavior in this code above?

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Oliv
  • 17,610
  • 1
  • 29
  • 72

1 Answers1

9

This is topic that is under active discussion, we can see this in the proposal p0593: Implicit creation of objects for low-level object manipulation. This is a pretty solid discussion of the issues and why they are not fixable without changes. If you have different approaches or strong views on the approaches being considered you may want to reach out to the proposals authors.

It includes this discussion:

2.3. Dynamic construction of arrays

Consider this program that attempts to implement a type like std::vector (with many details omitted for brevity):

....

In practice, this code works across a range of existing implementations, but according to the C++ object model, undefined behavior occurs at points #a, #b, #c, #d, and #e, because they attempt to perform pointer arithmetic on a region of allocated storage that does not contain an array object.

At locations #b, #c, and #d, the arithmetic is performed on a char*, and at locations #a, #e, and #f, the arithmetic is performed on a T*. Ideally, a solution to this problem would imbue both calculations with defined behavior.

  1. Approach

The above snippets have a common theme: they attempt to use objects that they never created. Indeed, there is a family of types for which programmers assume they do not need to explicitly create objects. We propose to identify these types, and carefully carve out rules that remove the need to explicitly create such objects, by instead creating them implicitly.

The approach using the adc union has the problem that we expect to be able to access the contained data via a pointer T* i.e. via std::vector::data. Accessing the union as a T* would violate the strict aliasing rules and therefore be undefined behavior.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • And don't you think the trick I propose get around the problem of pointer arithmetic on allocated storage, or there is an undefined behavior I missed? – Oliv Oct 25 '18 at 20:24
  • @Oliv I believe you have [strict aliasing issues](https://stackoverflow.com/a/51228315/1708801) since the underlying that of the vector should accessible as the stored data type i.e. via `data()` method – Shafik Yaghmour Oct 25 '18 at 20:42
  • When the first C and C++ Standards were written, the authors had no reason to think anyone would care about whether it actually defined the behaviors of constructs that were obviously useful and unanimously supported. They thus adopted behavioral models which can be made "workable" only by ignoring the fact that certain necessary operations are branded as UB. Unfortunately, by failing to explicitly recognize that different implementations are used for different purposes, some of which require support for more operations than others, the Standard has fragmented the language... – supercat Oct 26 '18 at 21:12
  • 2
    ...into two categories of dialects, one of which is unsuitable for purposes involving low-level programming, and the other of which supports only limited optimization. According to the C Rationale, the Spirit of C includes the principle "Don’t prevent the programmer from doing what needs to be done". Optimizing based on the idea that a programmer won't do X is only helpful in those cases where the programmer doesn't need to do X. Unfortunately, most arguments about defined behaviors completely lose sight of that. – supercat Oct 26 '18 at 21:20