2

In my C++ code,

vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();

So I combined these two vectors into a single one,

void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{

       for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
       {
             pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
       }
}

Now this function is called like this,

vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation

The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.

My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.

Is there a better way of achieving this without compromising the performance?

  • What do you mean by "in one go"? Did you actually profile your code? – melpomene Mar 02 '19 at 01:40
  • I mean, without going through the loop. –  Mar 02 '19 at 01:43
  • 2
    Before you attempt to optimize for performance, make sure the code works right. The shown code will exhibit undefined behavior, with a likely crash, when the arrays are not equal in size. – Sam Varshavchik Mar 02 '19 at 01:43
  • Arrays are guaranteed to be of the same size. –  Mar 02 '19 at 01:44
  • 3
    Since you have to physically move data, you are going to need a loop of some form. At least, you can resize the result vector before to avoid memory allocation in the loop. If you don't actually need a vector of pairs, you could also implement an iterator that mimics this (takes to vectors and iterates both while exposing the data as a pair). – Nico Schertler Mar 02 '19 at 01:46
  • Looks like a Python `zip` function: https://stackoverflow.com/questions/8511035/sequence-zip-function-for-c11 – tnt Mar 02 '19 at 01:58
  • If you don't need to access the paired data as a std::vector then you could move the data without loop into your own class and then implement the relevant access operators on that class. You could also implement your own std::container but that's going a bit far. – J.R. Mar 02 '19 at 02:06
  • `pairVector.push_back(pair (strVector.at(i), intVector.at(i)));` Too wordy -- `pairVector.push_back({strVector.at(i), intVector.at(i)})` – PaulMcKenzie Mar 02 '19 at 02:13
  • @Kris *Arrays are guaranteed to be of the same size* -- Is this by word of mouth, or is there code that actually makes sure this guarantee holds up? – PaulMcKenzie Mar 02 '19 at 02:14
  • @PaulMcKenzie - Yes it's guaranteed. –  Mar 02 '19 at 02:31
  • That really didn't answer the question. Guaranteed by a code check or guaranteed by someone telling you it's guaranteed? For the latter, people lie. Never trust what someone tells you when it comes to these types of guarantees. – PaulMcKenzie Mar 02 '19 at 02:33
  • @PaulMcKenzie - There's a code check –  Mar 04 '19 at 23:54

3 Answers3

0

You have clarified that the arrays will always be of equal size. That's a prerequisite condition.

So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.

Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.

So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.

But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.

But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:

pairVector.reserve(pairVector.size()+strVector.size());

Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.

A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:

for (int i = 0; i < strVector.size(); ++i )

Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:

    pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));

Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):

    pairVector.emplace_back(strVector[i], intVector[i]);

Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:

int i=strVector.size();
for (int i = 0; i < n; ++i )

This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
0

We can use std:generate() to achieve this:

#include <bits/stdc++.h>
using namespace std;

vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };

pair<string, int> f()
{
    static int i = -1;
    ++i;
    return make_pair(strVector[i], intVector[i]);
}

int main() {
    int min_Size = min(strVector.size(), intVector.size());
    vector< pair<string,int> > pairVector(min_Size);
    generate(pairVector.begin(), pairVector.end(), f);
    for( int i = 0 ; i < 2 ; i++ )
        cout << pairVector[i].first <<" " << pairVector[i].second << endl;    
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Shakil
  • 4,520
  • 3
  • 26
  • 36
  • 2
    You can use a lambda with `std::generate()`, eg: `generate(pairVector.begin(), pairVector.end(), [&, i = -1]{ ++i; return make_pair(strVector[i], intVector[i]); });` – Remy Lebeau Mar 02 '19 at 02:37
  • @Remy Lebeau - Your code giving error at ++i. Expression must be modifiable lvalue. –  Mar 04 '19 at 23:51
  • @Shakil - How will this be used in C++ world. For e.g., what if the vectors are inside main() function itself. How will that be called in pair f() function? I see that, f() cannot take any arguments. I think lambda is the right way to go! –  Mar 04 '19 at 23:53
  • @Kris try adding `mutable` to the lambda: `[&, i = -1] mutable -> pair { ++i; return make_pair(strVector[i], intVector[i]); }` – Remy Lebeau Mar 05 '19 at 02:14
  • @Kris in Shakik's examine, `generate()` calls `f()` and copies the return value into the `vector`, `f()` does not modify the `vector` directly. Using a lambda instead does not change that. Notice the lambda doesn't take any parameters, either. I suggest you read up on how `generate()` actually works – Remy Lebeau Mar 05 '19 at 02:14
  • @Remy Lebeau - adding mutable doesn't work either. It's complaining about missing '{' in lambda body.... –  Mar 05 '19 at 02:44
  • @Kris I forgot the parameter list is not optional when using `mutable`: `[&, i = -1]() mutable { ... }` [This works](https://www.ideone.com/mz2U12) – Remy Lebeau Mar 05 '19 at 03:33
0

I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?

If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.

  1. The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
  2. The new vector is now the storage mechanism for your data. (ei you no longer need the original two vectors)
  3. You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)

1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.

3) You do something like Shakil's answer or here or some type of customized iterator.

2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.

class StringIntContainer {

   public:

          StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec) 
              : string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
           {
               assert(string_vec_.size() == int_vec_.size());
           }

          std::pair<std::string, int> operator[] (std::size_t _i) const
          {
                return std::make_pair(string_vec_[_i], int_vec_[_i]);
          }

          /* You may want methods that return reference to data so you can edit it*/

          std::pair<std::vector<std::string>, std::vector<int>> Decompose()
          {
                return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
          }
   private:
          std::vector<std::string> _string_vec_;
          std::vector<int> int_vec_;

};
Bar Stool
  • 620
  • 3
  • 13