3

I am trying to make a copy of a vector of string and append it to the end of its original vector, i.e. duplicating its contents. Example:

 Input : vector<string> s = {"abc", "def"}
 Output: vector<string> s = {"abc", "def", "abc", "def"}

I was using the insert method, i.e.

s.insert(s.end(), s.begin(), s.end());

However, this exhibits compiler-dependent results. In, LLVM clang, it gave me the expected answer.

With GCC it gave me

 Output: vector<string> s = {"abc", "def", "", ""}

I am wondering why this happens and what's the safest way to achieve this vector duplication goal?

Here is the ideone.com link for the program above: http://ideone.com/40CH8q

CrepuscularV
  • 983
  • 3
  • 12
  • 25
  • 3
    It seems like UB if the iterators are invalid. Call `reserve` to guarantee noninvalidation. – Kerrek SB Feb 15 '14 at 19:55
  • Which version of g++ are you using? Using g++ 4.8.1, I get the correct output. – user3264405 Feb 15 '14 at 19:56
  • I was using the same g++4.8.1. I've attached a link to the online compiler to show the problem. – CrepuscularV Feb 15 '14 at 20:02
  • 3
    @KerrekSB Is this really **guaranteed**? I thought that any insertion or deletion invalidates the iterators (that means there is not any guarantee that they are still valid, although they *can* be) – leemes Feb 15 '14 at 20:04
  • @KerrekSB, `reserve` won't help you get rid of UB, because insertion invalidates all the iterators pointing past the insertion point including the end iterator. – ach Feb 15 '14 at 20:05
  • This is actually pretty interesting. You declare the vector with vector sv = {"abc", "def"}; while I define it with vector s; s.push_back("abc"); s.push_back("def"); For some reason, the way I define it, somehow it works. It may not be a safe operation though (ie. it might just work by luck). In addition, if I use const char * instead of string both work. – user3264405 Feb 15 '14 at 20:10
  • @user3264405 When initialized, it most probably allocates the *exact* space required to hold the values, as it's known immediately. When you push elements after initialization, it most probably follows a strategy to not require a new allocation for every insertion (that means, it can for example first allocate space for 0 (when initialized empty), then 1, then 4, then 16 elements). Such a strategy is required for *amortized* constant time for `push_back` – leemes Feb 15 '14 at 20:12
  • @AndreyChernyakhovskiy: Good point about `end()`. It may be necessary to exclude the last element and perform a final push-back separately. – Kerrek SB Feb 15 '14 at 20:17
  • @leemes: Insertion doesn't invalidate the iterators before the point of insertion unless a reallocation happens. Deletion never invalidates the iterators before the point of deletion. – Benjamin Lindley Feb 15 '14 at 20:21
  • @KerrekSB, I am failing to see how this trick may help. – ach Feb 15 '14 at 20:23
  • I can't find the most recent standard, but the '98 standard explicitly requires in [lib.sequence.reqmts], table 67: 'i,j are not iterators into a'. I am pretty sure that's never changed. – ach Feb 15 '14 at 20:27
  • @AndreyChernyakhovskiy: You're right, it's still UB because it violates the container requirements. I think I was making a lesser point that insertion at the end does not invalidate non-end iterators if it does not resize, which seems to be true according to C++11 23.3.6.5. – Kerrek SB Feb 15 '14 at 20:38
  • @KerrekSB Please check my answer. Does that invoke UB? – Ali Feb 15 '14 at 23:00
  • 1
    @Ali: Interesting; I can't see anything obviously UB about it. You're exploiting the fact that the vector's storage is contiguous... – Kerrek SB Feb 16 '14 at 00:29
  • @Ali: Sure, but don't be surprised if someone turns up a fault with it eventually. It's a subtle point, and in real-world code I would always demand something like leemes's answer, simply for readability's sake. Every part of it is amenable to instant inspection, and note that it would even be correct without the `reserve`. – Kerrek SB Feb 16 '14 at 00:48
  • @KerrekSB Yes, I would most likely also go with his answer; and yes, I know that `reserve()` is just an optimization in his case. However, in my highly biased opinion, my solution is just as readable as his because my solution is without explicit loop. I find it funny that many high-rep users are having such a long discussion on how to append a vector to itself. These are the moments when C++ looks overly complicated. – Ali Feb 16 '14 at 00:56
  • @Ali: It's not actually that surprising, when you think about how a vector has to work. And there was actually a defect report (considered NAD) about whether `v.push_back(v.back())` is legal, because justifying it from the wording of the standard is not entirely straight-forward. I mean, it *is* actually straight-forward, but there are very legitimate reasons to question its validity. – Kerrek SB Feb 16 '14 at 01:08
  • @KerrekSB Yes, that's what I am talking about :) Even (seemingly) simple things are overly complicated... What does NAD stand for? I googled it but couldn't figure it out. Not A Defect? – Ali Feb 16 '14 at 01:15

4 Answers4

5

Although it can possibly be done with iterators, a safe alternative is to avoid them:

size_t size = v.size();  // Of course we shouldn't access .size() in the loop...
v.reserve(size * 2);     // Preallocation. Thanks @Ali for this performance hint
for (size_t i = 0; i < size; ++i)
    v.push_back(v[i]);

In general, working with iterators while also modifying the data structure (not only its elements) is dangerous; you should read carefully when iterators are invalidated and when it's safe to reuse old iterators after a modification. Thus, it sometimes makes sense to use the "old" method to iterate through a random-access sequence: using an index variable.

leemes
  • 44,967
  • 21
  • 135
  • 183
  • 1
    Just checked and answered there; thank you for your hint, it's indeed a good idea in general to force pre-allocation if you know how many insertions will follow for sure. – leemes Feb 15 '14 at 23:42
  • 1
    How do you see it done with iterators? – ach Feb 16 '14 at 01:04
4

As others have already pointed out, you need to make sure by calling vector::reserve() that the vector doesn't get reallocated during insertion. (It is also a good idea to call reserve if you chose to put the elements with push_back() into your vector.)

Then there is still the iterator invalidation issue (as detailed under vector::insert()) but the code below, as far as I know, bypasses that:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include <vector>

using namespace std;

int main() {

  vector<string> s{"abc", "def"};

  auto size = s.size();

  s.reserve(2*size); // <-- This one is essential to ensure
                     //     no reallocation during insertion

  const string* first = s.data(); // using pointer, instead of iterator

  copy(first, first+size, back_inserter(s));

  for (const auto& e : s)
    cout << e << '\t';

  cout << endl;
}
Ali
  • 56,466
  • 29
  • 168
  • 265
  • From cppreference.com: "The past-the-end iterator is also invalidated." <-- That's the problem. If I understand it correctly, it doesn't make a difference if you write `s.end()` or `s.begin() + s.size()`; both are "past-the-end iterators". The only bypass would be to treat the first or last element different, also introducing a special case for empty vectors. In my opinion, it's not worth the effort since it's less readable than using plain old index access. – leemes Feb 15 '14 at 23:41
  • @leemes Thanks for checking. However, note that `s.data()` yields a *pointer*, not an iterator. So `s.data() + s.size()` is not equivalent to `s.end()`. – Ali Feb 15 '14 at 23:45
  • Oh, I read your code too fast. In this case it might indeed work. – leemes Feb 15 '14 at 23:50
  • @leemes Yeah, that often happens to me too. :( Updated the code, made the type explicit and also added a comment. Hopefully, the next fast reader won't miss the point(er). – Ali Feb 15 '14 at 23:58
0

As already stated, it seems all you need to do is make a call to reserve:

sv.reserve(sv.size() * 2);

This is because if a reallocation occurs, the iterators are invalidated. You can check this for yourself:

std::cout << sv.capacity() << "\n"; // 2

The new size will be bigger than the capacity so a reallocation occurs.

  • 4
    As far as I understand it, the end iterator might still be invalidated. – Kit Fisto Feb 15 '14 at 20:04
  • @Kit Maybe that's only for `push_back`. But I'm no expert. –  Feb 15 '14 at 20:10
  • 1
    Unfortunately, the end iterator is invalidated by any insert, even at the end, even if there's no reallocation. That's because it's after the insertion point. So although I don't think there's any serious reason why it *shouldn't* have defined behavior, using the end iterator to define the range that gets inserted is not (AFAIK) defined behavior. – Steve Jessop Feb 15 '14 at 21:46
  • @SteveJessop Please check my answer. Does that invoke UB? – Ali Feb 15 '14 at 23:03
  • @Ali: not sure. According to the standard the pointer `first+size` is invalidated, but you'd think that just means it can't be used to access an element any more (not that it could before, being an off-the-end pointer). Certainly in practice, being "invalidated" from the POV of the `vector` can't magically make the pointer no longer equality-comparable with other pointers :-) – Steve Jessop Feb 16 '14 at 11:47
  • @SteveJessop Getting more and more confused. In any case, thanks for checking! – Ali Feb 16 '14 at 13:04
0

This can be done this way too

vector<string> data = {"abc", "def"}; // Input String Vector
int len = data.size();
for(int i = 0; i < len; i++)
{
    data.push_back(data[i]); // Making Copy of String Vector
}

vector<string> data = {"abc", "def", "abc", "def"};
//Resultant String Vector
Shravan40
  • 8,922
  • 6
  • 28
  • 48