7

I need to be able to access (read-only, no resizing involved or anything like that) the elements of a an std::vector via pointers. E.g.,

std::vector<int> foo(10);
int *ptr_begin = &foo[0];

So far so good, this is guaranteed to work in the current standard (23.3.6.1):

The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

So we can access all elements of the vector using pointers, since they are stored in a contiguous chunk of memory. But what about the one-past-the-last element? I mean, it is legal to perform this operation?

int *ptr_end = ptr_begin + foo.size()

(Note, I am not trying to access the past-the-last value, merely to define a pointer to it - an equivalent to foo.end() if you like). The standard mentions only accessing elements via pointer arithmetics, but clearly here we are not accessing any element.

(As a side note, the definition of the existence of a past-the-last something seems tightly connected to the base concept of array (see for instance 5.7/5), but throughout the standard it seems like the concepts of contiguous storage and array are used interchangeably. Am I reading wrong?)

bluescarni
  • 3,937
  • 1
  • 22
  • 33

3 Answers3

13

Yes, as long as you only use ptr_end in comparisons* and don't attempt to deference it, this is fine. Quoting from § 5.7 in the C++11 draft standard regarding additive operations with pointers (emphasis mine):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Similar provisions are listed for the relational operators in § 5.9:

If two pointers point to elements of the same array or one beyond the end of the array, the pointer to the object with the higher subscript compares higher.

As to whether vector's buffer counts as an array for the purposes of the above, § 8.3.4 states:

An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.

which is consistent with what § 23.3.6.1 has to say about vector:

The elements of a vector are stored contiguously


Since pointers are iterators, this sort of thing is a handy trick for using standard library algorithms with an arbitrary block of memory as the input. For example, say you want to use the lower_bound algorithm, but your data is stored in an MFC CArray:

CArray<int> someInts;
// populate and check for empty
int* begin = someInts.GetData();
int* end = begin + someInts.GetSize(); // for bounds-checking only; don't dereference
int* answer = std::lower_bound(begin, end, 100);

*There are a couple other operations that are legal too; e.g. since you know your vector isn't empty, you could subtract one to get a pointer to the last element. The important thing is don't dereference.

dlf
  • 9,045
  • 4
  • 32
  • 58
  • 2
    You may wish to use foo.data() instead of &foo[0] to get the pointer to the beginning, as that will work correctly even for empty vectors. – Nevin May 13 '14 at 15:26
  • @Nevin: Why would `&foo[0]` not work correctly for empty vectors? (assuming, that is, that we're in the "`&*x` === `x`" camp, which may or may not be true :D http://stackoverflow.com/q/7346634/560648 http://stackoverflow.com/q/988158/560648) – Lightness Races in Orbit May 13 '14 at 15:31
  • Note that pointers *are* iterators, they don't need to be converted. – juanchopanza May 13 '14 at 15:41
  • Ugh. I didn't know that about C. I'm still not sure what happens if you have an empty vector where data() returns a non-nullptr.... – Nevin May 13 '14 at 16:09
  • &foo[0] on an empty vector is well defined while foo[0] is not. – knivil May 13 '14 at 16:10
  • @LightnessRacesinOrbit because `foo[i]` is undefined behavior for `i >= foo.size()`? The `&` happens after the undefined behavior, so cannot protect you from the nasal demons. – Yakk - Adam Nevraumont May 13 '14 at 20:07
  • One thing to be careful of -- `&*foo.end()` seems like such a useful way to get `foo.data()+foo.size()`, but technically is UB. Basically every implementation will work, except possibly instrumented debug implementations that catch the dereference of the past-the-end iterator... – Yakk - Adam Nevraumont May 13 '14 at 20:09
  • Having given this some thought, I believe that both &foo[0] and &foo.begin() both lead to undefined behavior on an empty vector (or any empty standard sequence container). Both foo[0] and foo.begin() return a reference, which it cannot legally do for an empty container. Even though C99 gives special dispensation for some pointers, that does not extend to class types. – Nevin May 13 '14 at 20:17
  • @Yakk I posted links to two questions that explore this, and there never seems to be a firm concensus. Forming a pointer one-past-the-end is perfectly reasonable; it's whether UB has already occured when forming such a pointer in this manner that's controversial, and last I checked required a DR to make unambiguous either way. I for one happen to agree with your interpretation, mind you. Still, you yourself state that this will work on essentially all implementations so Nevin's comment that it will not is, I think, still suspect. – Lightness Races in Orbit May 13 '14 at 22:32
  • @dlf Thanks, I am familiar with the quoted text in the standard, but the crux of the matter for me is if the underlying contiguous storage within the vector object can be considered as an "array" in the sense defined by the standard. – bluescarni May 13 '14 at 22:49
  • @bluescarni I actually investigated that angle quite a bit when I wrote my answer. Frustratingly, the standard doesn't come out and define the term "array" anywhere; at least not that I was able to find. But based on usage, they seem to just mean 1 or more contiguously-allocated objects of the same type. If so, `vector`'s buffer qualifies. – dlf May 13 '14 at 23:15
  • @dlf Eh, my experience while reading the standard has been pretty frustrating as well. My understanding is that "contiguous storage" is more or less interchangeable with "array". Thanks for the effort, I am marking as accepted for now. – bluescarni May 13 '14 at 23:31
  • @bluescarni No idea how I missed this before, but 8.3.4 says "An object of array type contains a contiguously allocated non-empty set of N subobjects of type T." – dlf May 13 '14 at 23:43
  • @dlf this seems to establish a relation array -> contiguous storage -- but not the other way around? – bluescarni May 14 '14 at 08:58
  • @bluescarni Yes; unfortunately in the absence of a formal definition of the term, I think we're stuck with inductive reasoning from the various examples of things the standard calls arrays. – dlf May 14 '14 at 11:59
1

You should remember about existence of std::vector::data(), which explicitly gives you the pointer to this continuous memory. So yes, read only operations are available (if you're not mixing them with other vector methods, which could do some reallocation etc. underneath).

I would even go further - changing contents of memory under std::vector::data() should also be legal.

zoska
  • 1,684
  • 11
  • 23
1

Yep, it is legal to form the address to the one-past-the-end element. It is not legal to deference it. The draft standard even has a footnote which notes that "an implementation need only provide one extra byte (which might overlap another object in the program) just after the end of the object in order to satisfy the “one past the last element” requirements". You are also allowed to do pointer math with it (at least all of the legal stuff that you could do if you were dealing with a pointer into the array).

Andre Kostur
  • 770
  • 1
  • 6
  • 15