1
#include <iostream>
#include <unistd.h>


using namespace std;


struct object
{
    int i;
    int j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z;
};

object *objectArray;
const int arraySize = 1920 * 1080;

int main()
{
    objectArray = new object[arraySize];

    while (1)
    {
        for (int i = 0; i < arraySize; i++)
        {
            objectArray[i].i = 1234;
        }

        sleep(1);
    }

    return 0;
}

I used the above code to test this. I compiled this program with:

g++ -O3 -std=c++14 src/main.cpp -o bin/main

At the current state it takes about 15 - 20 ms for the for loop to complete.

When struct object looks like this,

struct object
{
    int i;
}

the for loop takes about 0.8 - 1.2 ms to complete.

How and why does the size of struct object impact performance when I'm not even accessing all the members?

tomatocake
  • 61
  • 1
  • 3

3 Answers3

6

Modern desktop CPUs have hierarchical memory: access to the main memory goes through a series of caches (e.g. L2 and L1 caches) which are increasingly smaller and faster. Data that has never been seen before is loaded in to the cache first and then into the CPU registers from there, and the result is stored back into the cache. The cache is only written back to memory later on.

If multiple operations all affect data that's in the cache, then only a single writing back to memory is required at the end of the set of operations, which can be dramatically faster than accessing main memory directly for each single operation.

Moreover, memory is transferred to and from the cache in large blocks, called cache lines. Typical cache line sizes are 64 bytes or 128 bytes.

So when your class is { int i; }, then accessing the first element of the array already brings into the cache a number of subsequent objects, and multiple operations can be performed with just a single fetch from main memory. When the class is large, one cache line only contains the i member of one single array element, and so you need to access main memory for every array element.

Modern processors try to predict which main memory you may need next and start fetching speculatively, but all the same accessing main memory is orders of magnitude slower than accessing the cache, and so the array operation with high stride is significantly more expensive.

It is for this reason that it is important to consider access patterns when optimizing code (and data!) for performance. This is where you would consider "array of structs" vs "struct of arrays". Or, as general wisdom goes, "most of the time, performance problems are the result of a poor choice of data structures".

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
2

It has to do with how a computer works under the hood. Generally items in memory which are closer to each other are faster to access. This is because a processor accesses memory in aligned chunks and stores it in its cache. If all of the integers are not next to each other they will be spread out and will not fit as well into the cache. Take a look at this question.

Purpose of memory alignment

Community
  • 1
  • 1
chasep255
  • 11,745
  • 8
  • 58
  • 115
1

While you are used to working with individual bytes of memory, that's not really how the hardware works; the channel for moving data to and from main memory is optimized for transferring larger chunks of data. For example, it might transmit entire cache lines of 64 consecutive bytes, even if only a single byte was needed.

This matches your performance numbers very closely: one 4-byte int is one sixteenth of a 64 byte cache line, and the slower code takes sixteen times longer than the fast code.