fastest way to convert a std::vector to another std::vector

Question

What is the fastest way (if there is any other) to convert a std::vector from one datatype to another (with the idea to save space)? For example:

std::vector<unsigned short> ----> std::vector<bool>

we obviously assume that the first vector only contains 0s and 1s. Copying element by element is highly inefficient in case of a really large vector.

Conditional question: If you think there is no way to do it faster, is there a complex datatype which actually allows fast conversion from one datatype to another?

@NikolaiNFetissov: The OP wants to save memory. `std::vector` is specialized to be memory efficient. — Björn Pollex, Oct 26 '11 at 07:39
@ R. Martinho Fernandes: You got it!!! I do want magic, unfortainatly std::do_magic() does not seem to exist. — scigor, Oct 26 '11 at 08:02
@Björn, `vector` is a disaster. Even STL authors admit it was a mistake. — Nikolai Fetissov, Oct 26 '11 at 18:41

Nicol Bolas · Accepted Answer · 2018-05-05T19:48:32.357

37

std::vector<bool>

Stop.

A std::vector<bool> is... not. std::vector has a specialization for the use of the type bool, which causes certain changes in the vector. Namely, it stops acting like a std::vector.

There are certain things that the standard guarantees you can do with a std::vector. And vector<bool> violates those guarantees. So you should be very careful about using them.

Anyway, I'm going to pretend you said vector<int> instead of vector<bool>, as the latter really complicates things.

Copying element by element is highly inefficient in case of a really large vector.

Only if you do it wrong.

Vector casting of the type you want needs to be done carefully to be efficient.

If the the source T type is convertible to the destination T, then this is works just fine:

vector<Tnew> vec_new(vec_old.begin(), vec_old.end());

Decent implementations should recognize when they've been given random-access iterators and optimize the memory allocation and loop appropriately.

The biggest problem for non-convertible types you'll have for simple types is not doing this:

std::vector<int> newVec(oldVec.size());

That's bad. That will allocate a buffer of the proper size, but it will also fill it with data. Namely, default-constructed ints (int()).

Instead, you should do this:

std::vector<int> newVec;
newVec.reserve(oldVec.size());

This reserves capacity equal to the original vector, but it also ensures that no default construction takes place. You can now push_back to your hearts content, knowing that you will never cause reallocation in your new vector.

From there, you can just loop over each entry in the old vector, doing the conversion as needed.

edited May 05 '18 at 19:48

answered Oct 26 '11 at 07:41

Nicol Bolas

449,505
63
781
982

1

+1 for the vector-bool advice. Until I had a knock-down ... discussion ... here with someone a few days back, I didn't realise how ... questionable ... it was. The trick to get a full capacity vector without constructing any elements is also good. You just may want to consider using `char` as the underlying type rather than `int` (since `int` will be at _least_ as wide as `short`). But assuming that's just an _example_ showing how to copy efficiently, the answer's very good. – paxdiablo Oct 26 '11 at 07:46
Your answear ist pretty much exactly what i was looking for. Thanks. – scigor Oct 26 '11 at 08:07
1

-1 for warning about `vector` instead of suggesting the best and most obvious solution, which is the constructor that takes two iterators. Kibitzing about the details of the example belongs in the comments or at the end of the answer, not as a replacement for the best solution... – Steve Jessop Oct 26 '11 at 08:49
4

@SteveJessop: Your suggestion requires not only ignoring the dangers inherent in using `vector`, but it also requires an implicit conversion between the types in question. Which, in this example may be a `bool`, but the OP certainly seemed to suggest went beyond just this example. Writing an implicit conversion is not something to be done for every type. – Nicol Bolas Oct 26 '11 at 09:15
1

`push_back` means the vector needs to increment its endpoint on each iteration of the loop, which isn't needed with `resize`. It's not self-evident that this will be slower than initializing a large region of memory to a primitive value using `memset`. That answer may also depend on the ultimate size of the vector. Best to measure as the OP did if performance is critical. – Tim MB Oct 03 '19 at 13:06

score 23 · Answer 2 · answered Oct 26 '11 at 08:30

There's no way to avoid the copy, since a std::vector<T> is a distinct type from std::vector<U>, and there's no way for them to share the memory. Other than that, it depends on how the data is mapped. If the mapping corresponds to an implicit conversion (e.g. unsigned short to bool), then simply creating a new vector using the begin and end iterators from the old will do the trick:

std::vector<bool> newV( oldV.begin(), oldV.end() );

If the mapping isn't just an implicit conversion (and this includes cases where you want to verify things; e.g. that the unsigned short does contain only 0 or 1), then it gets more complicated. The obvious solution would be to use std::transform:

std::vector<TargetType> newV;
newV.reserve( oldV.size() );    //  avoids unnecessary reallocations
std::transform( oldV.begin(), oldV.end(),
                std::back_inserter( newV ),
                TranformationObject() );

, where TranformationObject is a functional object which does the transformation, e.g.:

struct ToBool : public std::unary_function<unsigned short, bool>
{
    bool operator()( unsigned short original ) const
    {
        if ( original != 0 && original != 1 )
            throw Something();
        return original != 0;
    }
};

(Note that I'm just using this transformation function as an example. If the only thing which distinguishes the transformation function from an implicit conversion is the verification, it might be faster to verify all of the values in oldV first, using std::for_each, and then use the two iterator constructor above.)

Depending on the cost of default constructing the target type, it may be faster to create the new vector with the correct size, then overwrite it:

std::vector<TargetType> newV( oldV.size() );
std::transform( oldV.begin(), oldV.end(),
                newV.begin(),
                TranformationObject() );

Finally, another possibility would be to use a boost::transform_iterator. Something like:

std::vector<TargetType> newV(
    boost::make_transform_iterator( oldV.begin(), TranformationObject() ),
    boost::make_transform_iterator( oldV.end(), TranformationObject() ) );

In many ways, this is the solution I prefer; depending on how boost::transform_iterator has been implemented, it could also be the fastest.

Thank you very much this was really informative. I wish i could also mark this answear as the right one. — scigor, Oct 26 '11 at 12:40

score 9 · Answer 3 · answered Oct 26 '11 at 07:36

9

You should be able to use assign like this:

vector<unsigned short> v;
//...
vector<bool> u;
//...
u.assign(v.begin(), v.end());

answered Oct 26 '11 at 07:36

Vlad

18,195
4
41
71

10

You can do this directly in the constructor: `std::vector u(b.begin(), v.end());`. – Björn Pollex Oct 26 '11 at 07:41
@BjörnPollex: that should be `u(v.begin(), v.end());` I think? – Chap Dec 04 '13 at 02:23
@Chap: Yes, there is a typo there. The comment is to old to be edited though. Thanks for spotting that. – Björn Pollex Dec 04 '13 at 09:41

score 4 · Answer 4 · answered Dec 07 '15 at 21:34

class A{... }
class B{....}
B convert_A_to_B(const A& a){.......}

void convertVector_A_to_B(const vector<A>& va, vector<B>& vb)
{
    vb.clear();
    vb.reserve(va.size());
    std::transform(va.begin(), va.end(), std::back_inserter(vb), convert_A_to_B);
}

score 3 · Answer 5 · edited May 23 '17 at 12:34

3

The fastest way to do it is to not do it. For example, if you know in advance that your items only need a byte for storage, use a byte-size vector to begin with. You'll find it difficult to find a faster way than that :-)

If that's not possible, then just absorb the cost of the conversion. Even if it's a little slow (and that's by no means certain, see Nicol's excellent answer for details), it's still necessary. If it wasn't, you would just leave it in the larger-type vector.

edited May 23 '17 at 12:34

Community

1
1

answered Oct 26 '11 at 07:36

paxdiablo

854,327
234
1,573
1,953

Of course it is better to use the proper datatype from the beginning. – scigor Oct 26 '11 at 07:58

AHelps · Answer 6 · 2011-11-03T04:06:52.113

First, a warning: Don't do what I'm about to suggest. It's dangerous and must never be done. That said, if you just have to squeeze out a tiny bit more performance No Matter What...

First, there are some caveats. If you don't meet these, you can't do this:

The vector must contain plain-old-data. If your type has pointers, or uses a destructor, or needs an operator = to copy correctly ... do not do this.
The sizeof() both vector's contained types must be the same. That is, vector< A > can copy from vector< B > only if sizeof(A) == sizeof(B).

Here is a fairly stable method:

vector< A > a;
vector< B > b;
a.resize( b.size() );
assert( sizeof(vector< A >::value_type) == sizeof(vector< B >::value_type) );
if( b.size() == 0 )
   a.clear();
else
   memcpy( &(*a.begin()), &(*b.begin()), b.size() * sizeof(B) );

This does a very fast, block copy of the memory contained in vector b, directly smashing whatever data you have in vector a. It doesn't call constructors, it doesn't do any safety checking, and it's much faster than any of the other methods given here. An optimizing compiler should be able to match the speed of this in theory, but unless you're using an unusually good one, it won't (I checked with Visual C++ a few years ago, and it wasn't even close).

Also, given these constraints, you could forcibly (via void *) cast one vector type to the other and swap them -- I had a code sample for that, but it started oozing ectoplasm on my screen, so I deleted it.

Jerry Miller · Answer 7 · 2016-04-01T15:01:33.330

#ifdef VECTOR_H_TYPE1
#ifdef VECTOR_H_TYPE2
#ifdef VECTOR_H_CLASS
/* Other methods can be added as needed, provided they likewise carry out the same operations on both */

#include <vector>

using namespace std;

class VECTOR_H_CLASS {
public:
        vector<VECTOR_H_TYPE1> *firstVec;
        vector<VECTOR_H_TYPE2> *secondVec;

        VECTOR_H_CLASS(vector<VECTOR_H_TYPE1> &v1, vector<VECTOR_H_TYPE2> &v2) { firstVec = &v1; secondVec = &v2; }
        ~VECTOR_H_CLASS() {}

        void init() { // Use this to copy a full vector into an empty (or garbage) vector to equalize them
                secondVec->clear();
                for(vector<VECTOR_H_TYPE1>::iterator it = firstVec->begin(); it != firstVec->end(); it++) secondVec->push_back((VECTOR_H_TYPE2)*it);
        }

        void push_back(void *value) {
                firstVec->push_back((VECTOR_H_TYPE1)value);
                secondVec->push_back((VECTOR_H_TYPE2)value);
        }

        void pop_back() {
                firstVec->pop_back();
                secondVec->pop_back();
        }

        void clear() {
                firstVec->clear();
                secondVec->clear();
        }
};
#undef VECTOR_H_CLASS
#endif
#undef VECTOR_H_TYPE2
#endif
#undef VECTOR_H_TYPE1
#endif

Now that I think about it, I wrote this for class and derived class pointers, so it could have limitations if used as a general-purpose tweak. Sorry! — Jerry Miller, Apr 01 '16 at 21:07

score 0 · Answer 8 · answered Oct 26 '11 at 07:33

0

Copying element by element is not highly inefficient. std::vector provides constant access time to any of its elements, hence the operation will be O(n) overall. You will not notice it.

answered Oct 26 '11 at 07:33

Daniel Daranas

22,454
9
63
116

Just because an operation is O(n) doesn't mean its impact is not noticeable. Or are you using linear lists to store data for random access ? – FrankH. Oct 26 '11 at 08:39

fastest way to convert a std::vector to another std::vector

8 Answers8

Linked

Related