0

i am new to c++ and often have the problem to create list of objects from file data. Since the amount of data may be huge i want to avoid unnecessary copying.

class Tool
{
  Tool(string fileName);
  vector<MyClass> read();      // 1)
  vector<MyClass*> read();     // 2)
  read(vector<MyClass*>& vec); // 3)
}

vector<MyClass> Tool::read()
{
  MyClass c;
  vector<MyClass> vec;

  c.a = 1;
  c.b = 2;
  vec.push_back(c);

  c.a = 3;
  c.b = 4;
  vec.push_back(c);

  ...

  return vec;
}

vector<MyClass*> Tool::read()
{
  MyClass* c;
  vector<MyClass> vec;

  c = new MyClass();
  c->a = 1;
  c->b = 2;
  vec.push_back(c);

  c = new MyClass();
  c->a = 3;
  c->b = 4;
  vec.push_back(c);

  ...

  return vec;
}

Tool::read(vector<MyClass*>& vec)
{
  MyClass* c;

  c = new MyClass();
  c->a = 1;
  c->b = 2;
  vec.push_back(c);

  c = new MyClass();
  c->a = 3;
  c->b = 4;
  vec.push_back(c);

  ...

  return vec;
}

In 1) the read will be quite expensive because there is much to copy. I have read somewhere that recent c++ versions could make those copies quite efficient but i am not sure

In 2) the copy operations should use significant less data. But I have to delete somewhere and at some time the objects where the pointers are pointing to. I guess i need to do that somewhere outside of Tool, since i don't need Tool anymore after read the file.

In 3) I can give the pointer to the read method and be able to push the pointers directly into the destination vector. Seem to be the best way out of the ways getting my vector of MyClasses. And since i give the vector to the read method I am more aware of deleting its elements later.

Maybe someone knows better ways to deal with such scenarios?

Thanks much for comments and help.

Edit1: I added implementations as UnholySheep suggested.

Huhu Doe
  • 63
  • 1
  • 6
  • 3
    You should also show the corresponding implementations of these functions, since those will also matter for performance – UnholySheep Feb 07 '23 at 19:43
  • You can basically either use (1) and move resulting vector or use `std::vector>`. – sklott Feb 07 '23 at 19:46
  • 1
    @sklott You shouldn't be writing something like `std::vector result; ... return std::move(result);` since even without the `std::move` the argument of `return` is moved and without `std::move` you've got a good chance of RVO kicking in, which isn't possible when using `std::move`. – fabian Feb 07 '23 at 19:55
  • 1
    @fabian you shouldn't use `std::move` when returning: https://stackoverflow.com/questions/14856344/when-should-stdmove-be-used-on-a-function-return-value – Alan Birtles Feb 07 '23 at 19:56
  • 1
    By the way, you forgot option 4: `void read(vector& vec);` - which guarantees that the vector will not have to be copied and avoids using pointers – UnholySheep Feb 07 '23 at 19:57
  • Yeah, "the amount of data may be huge" but is it? And does it matter? To this day, I've seen mountains of "optimized" code that's totally unreadable and not even remotely significant in edge cases. – Friedrich Feb 07 '23 at 19:59
  • 2
    @HuhuDoe Your implementation returning `vector` already makes sure there's no copy of the local vector happending; the result is moved or the vector may even be created in the location where the return value is stored directly (RVO). The only thing you could potentially improve here is calling `vec.reserve`, if you know about the number of elements beforehand and using `emplace_back` instead of `push_back` (assuming a suitable `MyClass` constructor exists); The latter is unnecessary, if copying `MyClass` is inexpensive. – fabian Feb 07 '23 at 20:01
  • Thank you for the answers. So 1) is the way to go with optimizations by @fabian. – Huhu Doe Feb 07 '23 at 20:13

1 Answers1

1

I did something interesting:

#include <iostream>
#include <vector>

// Don't do this in real code.
using namespace std;

std::vector<string> makeVec() {
    vector<string> vec;

    vec.push_back("first");
    vec.push_back("second");

    cout << "Address: " << &vec << endl;
    return vec;
}

int main(int, char **) {
    vector<string> vec = makeVec();
    cout << "Address: " << &vec << endl;
}

^ Foo
Address: 0x7ffdaeead8a0
Address: 0x7ffdaeead8a0

Notice that the address of the vector is the same both inside makeVec and then inside main(). This is because the compiler optimizes what is going on.

But then I changed main:

int main(int, char **) {
    vector<string> vec = makeVec();
    cout << "Address: " << &vec << endl;

    cout << endl;
    vector<string> vec2;
    vec2 = makeVec();
    cout << "Address2: " << &vec << endl;
}

Running this version:

Address: 0x7fff5b4febf0
Address: 0x7fff5b4febf0

Address: 0x7fff5b4fec30
Address2: 0x7fff5b4fec10

So... from this, if you're returning an object of any type, then if you can do both declaration and instantiation together, it's more efficient. Do this:

 vector<string> vec = makeVec();

Do NOT do this:

 vector<string> vec2;
 vec2 = makeVec();
Joseph Larson
  • 8,530
  • 1
  • 19
  • 36