0

I have been having trouble trying to understand how shared pointers work from the perspective of member variables. There are a lot of good questions on SO that explain shared pointers and how/when to use them (this, this, and this, among others). However, I've been struggling to understand how they work when we are dealing with class instance variables, particularly those that are containers (like std::vector).

I have a workflow that involves

  1. Reading data from a file
  2. Storing that data in a container in a class
  3. Accessing, modifying, and passing around that data in instances of that containing class

I am trying to understand if (and where) my code is doing needless copying, or if my passing of a constant reference is good enough. After doing some reading, I am inclined to think that I should be creating the objects once on the heap and passing shared pointers around rather than constant references, but I can't articulate why.

I have pasted parts of my program below in (roughly) their current implementation. My questions are:

  • For the PointCloud class, should I instead be storing shared pointers to Point3D objects in the _data vector? This way, during the filtering operation, I can populate my _inliers and _outliers PointCloud variables by filling their respective _data member variables with shared pointers to the appropriate Point3D objects?
  • For Scene class and the Filter class, am I better off with making the PointCloud type member variables shared pointers to PointCloud? I am pretty sure my current getter implementations return copies, and these PointCloud objects can contain millions of Point3D objects, so they're not small. Would this be better than returning a constant reference (as I do in PointCloud::getData()?

Ultimately I think the root of my question comes down to: should I do nearly all of my Point3D object creation on the heap and just store shared pointers in my container class member variables? Essentially this would be using a sort of "shared memory" model. If so, what would the efficiency improvement be? If not, why not?


Here are some components of my program.

(Note: in reality implementations are not actually all inlined. I just did that here for brevity).

scene.h

#include "pointcloud.h"
#include "point3D.h" // just defines the Point3D class

class Scene
{
    private:
        // Other member variables...

        /** Pointcloud */
        PointCloud _pointcloud;

    public:
        // ...Public functions...

        inline PointCloud getPointCloud() const; // Return point cloud
        inline void readPoints3D(const std::string &path_to_file);
};

PointCloud Scene::getPointCloud() const { return _pointcloud; }

void Scene::readPoints3D(const std::string &path_to_file)
{
    std::ifstream file(path_to_file.c_str());

    // Run through each line of the text file
    std::string line;
    std::string item;
    while (std::getline(file, line))
    {
        // Initialize variables id, x, y, z, r, g, b from file data

        Point3D pt(id, x, y, z, r, g, b); // Create Point3D object

        _pointcloud.addPoint(pt); // Add point to pointcloud
    }
    file.close();
}

pointcloud.h

#include "point3D.h"

#include <vector>

class PointCloud
{
    private:
        /** PointCloud data */
        std::vector<Point3D> _data;

    public:
        // Public member functions
        inline std::vector<Point3D> & getData();
        inline const std::vector<Point3D> & getData() const;
        inline void addPoint(const Point3D &pt);
};

const std::vector<Point3D> & PointCloud::getData() const { return _data; }

std::vector<Point3D> & PointCloud::getData() { return _data; }

void PointCloud::addPoint(const Reprojection::Point3D &pt)
{
    _data.push_back(pt); // Add the point to the data vector
}

filter.h

#include "pointcloud.h"

// Removes 3D points from a pointcloud if they don't meet certain conditions
class Filter
{
    private:

        // Good points that should stay in the pointcloud
        PointCloud _inliers;

        // Removed points
        PointCloud _outliers;

    public:
        Filter(const PointCloud &pointcloud) // Constructor

        void filterPointCloud(); // Filtering operation

        inline PointCloud getInliers();
        inline PointCloud getOutliers();
}

Filter::Filter(const PointCloud &pointcloud) :
    _inliers(pointcloud), _outliers(PointCloud()) {}

PointCloud getInliers() { return _inliers; }
PointCloud getOutliers() { return _outliers; }

driver.cpp

#include "scene.h"
#include "pointcloud.h"
#include "filter.h"

// Some function that writes my pointcloud to file
void writeToFile(PointCloud &cloud);

int main()
{
    Scene scene;

    scene.readPoints3D("points3D_file.txt");

    PointCloud cloud = scene.getPointCloud();

    Filter f(cloud);
    f.filterPointCloud();

    writeToFile(f.getInliers());
    writeToFile(f.getOutlier());
}
Community
  • 1
  • 1
marcman
  • 3,233
  • 4
  • 36
  • 71
  • "should I do nearly all of my object creation on the heap for these classes and just store shared pointers in my member variables?" Almost certainly not - you should try to avoid heap allocation wherever possible. Pass by reference avoids any copying overhead. –  Mar 02 '17 at 16:23
  • @NeilButterworth: But how can one do that when an object goes out of scope? For example in my `readPoints3D` function I am storing it as a member variable. If I wanted a more generic implementation that returns a `PointCloud` object, wouldn't I run into [this problem](http://stackoverflow.com/questions/6441218/can-a-local-variables-memory-be-accessed-outside-its-scope) if I try to return a reference? – marcman Mar 02 '17 at 16:36
  • @NeilButterworth: And also, what about the issue of my `Filter` class? How can I populate the `_inliers` and `_outliers` variables without copying all the `Point3D` objects from the input `PointCloud`? – marcman Mar 02 '17 at 16:39
  • If possible, return references, as you already do with getData (not a great name) in your PointCloud class. –  Mar 02 '17 at 16:42
  • `shared_ptr` is for shared ownership in a multithreaded application. You did not mention anything about that. Therefore .... do not use `shared_ptr` here. If you want to avoid copies then do not copy. – knivil Mar 02 '17 at 17:04
  • @knivil: I believe most of the explanations I've read would disagree. It's for shared ownership, but not necessary restricted to multithreaded applications. If you have multiple objects in scope that operate on the same other object, I believe you may want to use a shared_ptr to ensure that it doesn't get destroyed before your other objects are done with it, no? – marcman Mar 02 '17 at 17:09
  • And I believe you just want to use `shared_ptr` without any real reason or because everybody hypes it so much. There was excactly one situation where I needed a reference counted pointer. And I write a lot of C++ for fun and profit. See: https://meetingcpp.com/index.php/br/items/shared_ptr-addiction.html with the biggest drawback for me: it will force you and any other user of this code, to use std or/and boost::shared_ptr. – knivil Mar 02 '17 at 17:23
  • 1
    @marcman try using values and references. If that don't suits your needs and you really need heap allocation, then use `std::unique_ptr`. Only really special cases need `std::shared_ptr`. – Guillaume Racicot Mar 02 '17 at 17:27

2 Answers2

2

When you have a lifetime problem, one piece of advice you get is "use shared_ptr". Once you take that advice, you have two problems.

First, you are now using shared_ptr, so getting it to express your lifetime problem. And you still have your original lifetime problem.

shared_ptr expresses the concept that the lifetime of a particular bit of data should be the union of the lifetimes of all of the shared_ptrs to that bit of data. This is a complex description of the lifetime of an object.

Sometimes you actually need a complex object lifetime description. At other times you do not.

Keeping your code as simple as it can be, when you don't need complex object lifetimes, don't use pointers.

If your code is simple and functionally pure, there is little need for pointers. If you have entangled state, then you may need pointers, but avoiding that entangled state is a better solution than making the pointer and lifetime management extremely complex.

Use values when you can; values are simple, easy to reason about, and permit scaling. Avoid using object identity (ie, its memory location) as anything meaningful to your system as much as you can.

There are great reasons to use shared_ptr, but simply "allocate everything on the heap and use shared_ptr and don't worry about object lifetime" leads to unmaintainable spaghetti of dependent stats, leaks, crashes, random hidden dependencies between components, unpredictable object lifetimes, and abysmal performance due to cache incoherence.

Full scale garbage collection systems still have serious and complex object lifetime issues and serious locality problems; and shared_ptr is not a full scale garbage collection system.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
1

It all comes down to object lifetimes, and who controls them. The beauty of std::shared_ptr is that it handles that for you automatically; when the last shared_ptr to an object is destroyed, the object itself is destroyed along with it. As long as a shared_ptr to an object exists, it will still be valid.

A reference (const or not) does not have that guarantee. It is quite easy to generate a reference to an object that goes out of scope. However, if you can guarantee through other means that the object lifetime will exceed that of the reference, a reference is more flexible and more efficient than shared_ptr.

One place where you can make the guarantee of an object lifetime is when you're calling a function with the object as a parameter. The object will exist until the end of the function, unless the function tries to do something crazy like delete it. The only time you would need a shared_ptr as a parameter is when the function tries to save the pointer for later use.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622