Not all optimizations have individual flags, so no combination of them will generate the same code as using -O1
or any other of the general optimization enabling options (-Os
, -O2
, etc...). Also I imagine that a lot of the specific optimization options are ignored when you use -O0
(the default) because they require passes that are skipped if optimization hasn't generally enabled.
To try to narrow down your performance increase you can try using -O1
and then selectively disabling optimizations. For example:
g++ -O1 -fno-peephole -fno-tree-cselim -fno-var-tracking ...
You still might not have better luck this way though. It might be multiple optimizations in combination are producing your performance increase. It could also be the result of optimizations not covered by any specific flag.
I also doubt that better cache locality resulted in your "incredible boost in performance". If so it was likely a coincidence, especially at -O1
. Big performance increases usually come about because GCC was able eliminate a chunk of your code either because it didn't actually have any net effect, always resulted in the same value being computed or it invoked undefined behaviour.