For many high performance applications, such as game engines or financial software, considerations of cache coherency, memory layout, and cache misses are crucial for maintaining smooth performance. As the C++ standard has evolved, especially with the introduction of Move Semantics and C++14, it has become less clear when to draw the line of pass by value vs. pass by reference for mathematical POD based classes.
Consider the common POD Vector3 class:
class Vector3
{
public:
float32 x;
float32 y;
float32 z;
// Implementation Functions below (all non-virtual)...
}
This is the most commonly used math structure in game development. It is a non-virtual, 12 byte size class, even in 64 bit since we are explicitly using IEEE float32, which uses 4 bytes per float. My question is as follows - What is the general best practice guideline to use when deciding to pass POD mathematical classes by value or by reference for high performance applications?
Some things for consideration when answering this question:
- It is safe to assume the default constructor does not initialize any values
- It is safe to assume no arrays beyond 1D are used for any POD math structures
- Clearly most people pass 4-8 byte POD constants by value, so there doesn't seem to be much debate there
- What happens when this Vector is a class member variable vs a local variable on the stack? If pass by reference is used, then it would use the memory address of the variable on the class vs a memory address of something local on the stack. Does this use-case matter? Could this difference where PBR is used result in more cache misses?
- What about the case where SIMD is used or not used?
- What about move semantic compiler optimizations? I have noticed that when switching to C++14, the compiler will often use move semantics when chain function calls are made passing the same vector by value, especially when it is const. I observed this by perusing the assembly breakdown
- When using pass by value and pass by reference with these math structures, does const make a much impact on compiler optimizations? See the above point
Given the above, what is a good guideline for when to use pass by value vs pass by reference with modern C++ compilers (C++14 and above) to minimize cache misses and promote cache coherency? At what point might someone say this POD math structure is too large for pass by value, such as a 4v4 affine transform matrix, which is 64 bytes in size assuming use of float32. Does the Vector, or rather any small POD math structure, declared on the stack vs. being referenced as a member variable matter when making this decision?
I am hoping someone can provide some analysis and insight to where a good modern guideline for best practices can be established for the above situation. I believe the line has become more blurry as for when to use PBV vs PBR for POD classes as the C++ standard has evolved, especially in regard to minimizing cache misses.