Contiguous Memory

Question

So i'm performing the same function on a large set of objects. To my understanding, it's better to keep this set of objects in contiguous memory space for better time.

std::vector<MyObject> myObjects;
// Fill vector with thousands of objects
for( auto const& obj listMyObjects ) { obj.doThing(); }

Note: myObjects lives for the lifetime of the application and is changing constantly.

In other parts of my code, I want to work directly with a single instance of MyObject. Normally I'd wrap my vector with a class like MyObjectManager and make a getter:

MyObject * obj = MyObjectManager::getObject( "OBJECT_NAME" );
obj->setName( "NEW_NAME" );

But this is error-prone as the pointer may be invalidated if the vector is resized. This leaves me to using an index or id instead, and effect all changes through the the manager

int objId = MyObjectManager::getObject( "OBJECT_NAME" );
MyObjectManager::setObjectName( objId, "NEW_NAME" );

But I don't feel this is the best way as it tightly couples my manager and object class. So I thought about creating an interface for objects that is a friend of the manager:

class MyObject
{
    std::string myName;
}

class MyObjectInterface
{
    int myId;

    MyObjectInterface( int id ) { myId = id; }
    void setName( std::string name ) { MyObjectManager::myObjects.at( myId ).name = name; }
}

class MyObjectManager
{
    friend class MyObjectInterface;
    static std::vector<MyObject> myObjects;

    static MyObjectInterface getObject( std::string name ) { return MyObjectInterface( myObjects.find( name ) ); // Psuedo code }
}

MyObjectInterface obj = MyObjectManager::getObject( "OBJECT1" );
obj.setName( "OBJECT2" );

Note: Excuse issues in the code. This is more about the idea then it is about the code.

While this seems like it'll work, I'm curious if I've reasoned about the issue correctly and whether this solution is over-complicating things? Would love some insight, thanks.

Do you have multiple threads modifying the vector simultaneously? If not then you don't have to worry about using pointers to objects in the vector, as long as you dereference them before you do something to the vector which might cause a reallocation. — Some programmer dude, Jul 24 '17 at 13:17
In addition, _"the pointer may be invalidated if the vector is resized"_ - can the needed capacity change so drastically that you don't want to `.reserve()` the maximum up-front, thus avoiding reallocation? — underscore_d, Jul 24 '17 at 13:19
Are you going to be doing `for( auto const& obj listMyObjects ) { obj.doThing(); }` a lot? If not you might consider using a `std::list` which do not invalidate iterators/references to other elements. [This](https://stackoverflow.com/questions/45152511/remove-while-iterating-stdvector-indirectly) might also help. — NathanOliver, Jul 24 '17 at 13:28
*"To my understanding, it's better to keep this set of objects in contiguous memory space for better time."* Have you benchmarked, or are you basing this design entirely on an assumption? — cdhowie, Jul 24 '17 at 13:42
if your code is sensitive to performance issues calling something like myObjects.find( name ) on unsorted sequential data is a serious performance hit. Using map or unordered map would be a much better choice additionally freeing you from the problems with container reallocation. — jszpilewski, Jul 24 '17 at 13:53
@Someprogrammerdude Yeah there *will* be multiple threads wanting to access the set. I'm not going to tackle synchronization in this post, but if it seems like the answer here depends highly on it then I'm ready to flesh out the details on that. — jakedipity, Jul 24 '17 at 14:08
@underscore_d The number of objects can't be known ahead of time. I'm building a framework with different uses everywhere so there's no telling even if there will be only a few objects or a lot of objects. — jakedipity, Jul 24 '17 at 14:08
@NathanOliver That line of code will happen a lot, but the structure definitely is not settled yet. I'll consider that! — jakedipity, Jul 24 '17 at 14:08
@cdhowie It's not an assumption for general memory: [Iterating Contiguous Memory](https://stackoverflow.com/questions/22703663/c-performance-of-vector-of-pointer-to-objects-vs-performance-of-objects). I am; however, still in the process of designing my system so I can't robustly test the implications and how other downfalls e.g. having to copy the entire vector on resize will have on my framework. — jakedipity, Jul 24 '17 at 14:19
@JacobHull Right, but remember that the bottleneck for your application might be elsewhere, or the performance impact may not be significant enough to warrant this kind of optimization. — cdhowie, Jul 24 '17 at 14:39
Does the ID represent an index? If so, it should not be `int` but rather `vector::size_type`. Alternatively, could you instead use a `[unordered_]map` and pass its keys to/from the caller? — underscore_d, Jul 24 '17 at 15:18

Chris Drew · Accepted Answer · 2017-07-24T15:38:55.157

I think it is reasonable to use an index or an id instead of a pointer for long-lived references to objects in a vector for the reasons you give. I think it is also reasonable to wrap that id in a class for more type safety.

But I would normally stop short of making that class a full proxy for the real object. If the vector is really changing "constantly" from multiple threads then even something like

MyObjectManager::myObjects.at( myId ).name = name;

is not safe. Between getting the reference to the object using at and setting the name, the vector could have been resized and your reference could be invalidated.

You normally need to have some local guarantee in performance critical code that the vector is not going to be resized and it is safe to use references or pointers. The cost of looking up objects by id in a tight loop will probably outweigh any benefits you are getting from data locality anyway.

Overuse of MyObjectInterface is probably going to be the enemy of data locality given that unlike MyObject they are not guaranteed to be contiguous in memory.

If MyObjectInterface and/or MyObjectManager are actually required to provide complicated synchronization machinery then I suggest you post a separate question that includes that.

Contiguous Memory

1 Answers1