I was watching a talk, "Efficiency with Algorithms, Performance with Data Structures", and was surprised by the comment that in:
#include <string>
#include <unordered_map>
#include <memory>
struct Foo {
int x;
};
Foo* getFoo(std::string key,
std::unordered_map<std::string,
std::unique_ptr<Foo>> &cache) {
if (cache[key])
return cache[key].get();
cache[key] = std::unique_ptr<Foo>(new Foo());
return cache[key].get();
}
Foo* getFooBetter(std::string key,
std::unordered_map<std::string,
std::unique_ptr<Foo>> &cache) {
std::unique_ptr<Foo> &entry = cache[key];
if (entry)
return entry.get();
entry = std::unique_ptr<Foo>(new Foo());
return entry.get();
}
getFooBetter()
is better. I had been under the belief that I could rely
on the compiler for performing this sort of transformation in the same
way that I'd expect multiple occurrences of x+y
to be evaluated only
once. Unsurprisingly, the generated LLVM IR indeed agrees with the
presenter. Even with -O9, we're left with 3 calls to cache[key]
in
the getFoo()
version.
I've moved the long LLVM IR of both with c++ symbols unmangled out of line so as not to be visually offensive.
Another StackOverflow question reveals that part of the answer here is that operator[]
is assumed to be able to modify whatever global state it wishes, and
thus we can't elide calls. A linked proposal about introducing a
[[pure]]
annotation talks about its applications to CSE.
If we stayed at 4 calls, I'd be able to end here feeling satisfied.
However, if my reading of the IR is correct, it looks like we optimized
getFoo()
into as if we wrote:
Foo* getFoo(std::string key,
std::unordered_map<std::string,
std::unique_ptr<Foo>> &cache) {
if (cache[key])
return cache[key].get();
std::unique_ptr<Foo> &entry = cache[key];
entry = std::unique_ptr<Foo>(new Foo());
return entry.get();
}
Would someone be able to explain what clang's view of the code is such
that it was able to merge the two last cache[key]
s, but not all of
them? (My local clang is 3.4.)