Improving data locality of reference when storing pointers to objects

Question

How I understand things is that storing objects directly to a vector yields better performance than storing pointers to them because of prefetching.

std::vector<Object> container1; // The objects are stored sequentially. 
std::vector<Object*> container2; //The pointed to objects are all over the place.

Is there any way to deal with this?

I was thinking of using using sequential containers for the pointed-to objects. This way both the pointers and the objects would be sequential in memory but I am not an expert in caching and I am not even sure whether this would help with performance.

What problem are you trying to solve? What bottleneck do you have? — Alan Stokes, May 31 '14 at 19:08
Assuming you're at the data structure design stage, would be a good idea to look at Data Oriented design, if you may need high performance. Plan compatability, even if your prototype is naive — Rob11311, May 31 '14 at 19:33
@AlanStokes I don't really have a bottleneck. I am using an entity-component system for a game I am making and I have to pass pointers to objects around. Because of that I can't keep the actual objects on a vector since I can't prevent reallocation (I don't know the size in advance). In any case it runs fast enough even when storing pointers, I am just curious how this could be handled. — Veritas, Jun 01 '14 at 05:31

score 2 · Accepted Answer · edited May 23 '17 at 12:06

Some measurements in a chart, from example worked in the Pitfalls Presentation, showing effect of standard OO, locality with contiguous storage and then additional improvements reorganising data structures flattening them further and prefetching - http://seven-degrees-of-freedom.blogspot.co.uk/2009/12/pitfalls-of-object-oriented-programming.html?view=flipcard

Pitfalls of OO Programming presentation, that is written for games programmers which I found extremely clear, explains the issue with traditional objects using many pointers, on modern systems "Pitfalls of OO" - http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

Furthermore it's not just about cache prefetch & locality on a server/desktop system, but also the possibility of page faults (may be to/from disk) which can make scattered fragmented object allocations very inefficient. Secondly, you may be wanting to use multiple cores, the problem with updating objects, is access contention and locking as each object has no way of knowing other accesses to object are "safe".

So to increase locality there are gains for packing objects, as tight as possible and having blocks of them on same page in memory (especially for small items which overlap on cachelines), which is why in C, preallocating arrays of nmemb structures with void *calloc(size_t nmemb, size_t size) can be much more efficient than one at time with malloc void *malloc(size_t size);

Now suppose most of the time, you do computations repeated on large number of objects, like summing a particular value in the object, or transform it in some way updating certain fields, then you'ld like to organise it to pack the data together, so as much as possible is on one cache line, and you aren't loading from RAM (or disk) all the stuff you rarely need or use, like may be identification strings, link pointers or other values you don't expect to need for a while. These fields can actually be independant, so a summation could occur simultaneously with a transform of different fields. Or processing of a large list of items can be farmed out across CPUs by simply splitting it in half or quarter, without contention for a lock required to safely read, or update every object. You know the data you're interested isn't contended for.

In that case, you'ld rather have parallel arrays, which you can scan sequentially with predictable prefetching that the CPU can detect and anticipate so initiates loads automatically. There's no pointer chains, the order of items is implicit so you're maximising data density by avoiding wasteful 4/8 byte pointers. And you can reduce lock contention to fields which actually require simultaneous updates from more than 1 thread. To see why it matters Herb Sutters Machine Architecture talk is interesting (but long video) - http://www.youtube.com/watch?feature=player_detailpage&v=L7zSU9HI-6I#t=897

When you start thinking like that you are considering Data Oriented design - What is data oriented design?

It can help performance massively if it suits your problem, and is used by Games Programmers - http://gamedevelopment.tutsplus.com/articles/what-is-data-oriented-game-engine-design--cms-21052

There's another previous answer here which has more detailed explanation and links to some very detailed papers explaining modern memory issues - What is "cache-friendly" code?

score 0 · Answer 2 · answered May 31 '14 at 17:44

I suppose your choice should depend on type of relations between container owner and contained objects. If container owner manages their lifetime, you're free and probably encouraged to use objects container. If there's no direct management relations, for example in case of listeners container, you should use container of pointers. I don't really think this is a good point for performance concerns.

score 0 · Answer 3 · answered May 31 '14 at 17:51

You will not be able to do this with std::vector in any reasonable manner. The container is just not fit for that purpose. Consider just a few of the problems that come to mind:

container2.push_back(nullptr);
Object o;
container2.push_back(&o);
Object *ptr = new Object;
container2.push_back(ptr);
container2.push_back(ptr); // added twice

If you somehow arrange for the dynamically allocated objects to exist sequentially on the heap, then this has pretty much nothing to do with storing pointers to them in some container.

Chances are that your performance problem is just imagined. Have you actually measured performance? Otherwise, perhaps you should read more about heap fragmentation.

Improving data locality of reference when storing pointers to objects

3 Answers3