A lot of the focus on performance in C++ is to reduce useless copying.
The language itself comes with a lot of nice features that help in this regard, including move semantics, perfect forwarding, etc.
There are times when is hard to see the useless copies.
Also, some of these copies can be made by casting or assigning to the wrong type at the wrong time, which can end up being expensive, especially when using lazy evaluation libraries.
Besides measuring time performance and looking close at code that seem to be suspiciously slow, is there a better way of avoiding these issues? Maybe some form of memory benchmarking procedure?