0

My class dynamically allocates an array of integers (in the code below named this_data), but the array can only be filled up to a maximum length (this_data_maxlen). I would like to add data (add_data) to the array, but the adding array could exceed the maximum length.

Knowing that there is no reallocate for new keywords, I wrote one version that uses memcpy and another version that uses vector, as that was suggested to be the most efficient way. However when testing, the vector version turns out the be 5 times slower than the memcpy. Am I doing something wrong in the vector version?

Memcpy version:

unsigned int this_data_len=50;
unsigned int this_data_maxlen=70;           //maximum length of existing dataset
unsigned int add_data_len=50;

int *this_data = new int[this_data_len];    //existing dataset
int *add_data = new int[add_data_len];      //data we want to add to existing dataset

//function here that puts values in them

unsigned int new_data_len=min(this_data_maxlen,this_data_len+add_data_len);     //calculate the new dataset length (=7)
int *new_data=new int[new_data_len];        //create the new dataset

//build the new 'this_data'
memcpy(new_data,this_data,this_data_len*sizeof(int));   //copy existing dataset values to new dataset
memcpy(new_data+this_data_len,add_data,(this_data_maxlen-this_data_len)*sizeof(int));       //fill up the new dataset with a selection of the data to add
delete [] this_data;                        //remove original dataset
this_data=new_data;                         //set the new dataset

//build the new 'add_data'
add_data_len=add_data_len-(this_data_maxlen-this_data_len); //update the add_data length (=2)
new_data=new int[add_data_len];
memcpy(new_data,add_data+(this_data_maxlen-this_data_len),add_data_len*sizeof(int));
delete [] add_data;
add_data=new_data;

this_data_len=new_data_len;                 //set the new dataset length

//clean up
delete [] this_data;
delete [] add_data;

Vector version:

unsigned int this_data_len=50;
unsigned int this_data_maxlen=70;           //maximum length of existing dataset
unsigned int add_data_len=50;

vector<int> this_vector(this_data_len);
vector<int> add_vector(add_data_len);

//function here that puts values in them

unsigned int new_data_len=min(this_data_maxlen,this_data_len+add_data_len);     //calculate the new dataset length (=7)
this_vector.reserve(new_data_len);
this_vector.insert(this_vector.end(),add_vector.begin(),add_vector.begin()+(this_data_maxlen-this_data_len));
add_vector=vector<int>(add_vector.begin()+(this_data_maxlen-this_data_len),add_vector.end());

add_data_len=add_data_len-(this_data_maxlen-this_data_len); //update the add_data length (=2)
this_data_len=new_data_len;                 //set the new dataset length
DoubleYou
  • 1,057
  • 11
  • 25
  • Why not use `std::vector`? – Captain Obvlious May 16 '14 at 02:50
  • There is no `new` operator that is analogous to `realloc`. Check out this SO post. http://stackoverflow.com/questions/2557866/reallocating-memory-via-new-in-c – R Sahu May 16 '14 at 03:04
  • 1
    You're using the wrong form of delete (`delete this_data` instead of `delete [] this_data`). I suggest you use `std::vector` -- the `vector` class is designed to do what you're doing in the most optimal way possible (if we assume the compiler library authors are top-notch programmers, which they usually are). – PaulMcKenzie May 16 '14 at 03:11
  • @PaulMcKenzie - thanks for noting the `delete` issue. The initial reason I'm not using vector is because the data is loaded and stored as a single block from a binary file - until just now I found [this answer](http://stackoverflow.com/a/2780397/2710064) that allows vectors anyway. – DoubleYou May 16 '14 at 03:42
  • Part (2) is certainly "Yes - use a vector". For part (1): The vector (with standard allocator) cannot do a `realloc`-like operation, it still will `new` a new block and copy or move the contents when the vector grows beyond capacity. To get realloc-like behaviour you will have to implement your own container that uses `realloc`. AFAIK it's not possible to write a standard allocator that can realloc without moving. – M.M May 16 '14 at 03:47
  • 1) Use `vector::erase` instead of `add_vector=vector` 2) Did you checked time in release or in debug mode ? – borisbn May 16 '14 at 05:19
  • When some people say that `vector` is hugely slower than their hand rolled pointer stuff, chances are that either they have unnecessarily copied the vector multiple times, or that they don't turn on optimization. – Siyuan Ren May 16 '14 at 05:21
  • Thanks @borisbn - and I indeed forgot that debug version is slower. Runs at similar speeds now in release. – DoubleYou May 16 '14 at 05:29

1 Answers1

0

Your tests do not seem equivalent. In particular, this line may be doing a lot more work than you intended:

add_vector = vector<int>(add_vector.begin() + (this_data_maxlen - this_data_len),
                         add_vector.end());

At the very minimum, this line is likely to be creating a temporary vector, deleting the contents of the original vector, and possibly doing a copy of the temporary vector's contents into the original vector before deleting it. (With RVO, or with the move semantics of C++11, some of that may be avoided.)

And all this is to implement the effect of

add_vector.erase(add_vector.begin(),
                 add_vector.begin() + (this_data_maxlen - this_data_len));

As another comment, if you're always adding data to the end and removing it form the beginning, you may be better off using a std::queue or std::deque instead of a std::vector.

(After I wrote this up, I saw that borisbn and C.R. have already provided the information on using vector::erase hours before me. Hats off to you both.)

Michael Urman
  • 15,737
  • 2
  • 28
  • 44