Before anyone would tell me to look up old answers or RTFM, please note that I've already done so, so please read the details before directing me to look elsewhere.
I've established that the difference in Optimization levels isn't as simple as some different types of optimization flags having been enabled for a higher optimization level.
For example, I first found the difference in optimization flags of O0 and O1 by following these steps:
gcc -c -Q -O1 --help=optimizers > /tmp/O1-opts
gcc -c -Q -O0 --help=optimizers > /tmp/O0-opts
diff /tmp/O0-opts /tmp/O1-opts | grep enabled
This gave me a list of various optimization flags enabled by O1 over O0.
Then, I compiled the code with -O0 but added all the individual optimization flags enabled by O1 over O0, because the result should be same as O1, right? Well, guess what, it's not!
So, this proves that the difference between optimization levels is not simply the different types of optimization flags used. I mean there must be more differences in optimizations besides the optimization flags that gcc/g++ displays.
Please let me know if someone already knows the answer to this question, or I'll have to look up the source-code of gcc, which wouldn't be trivial for me. Thank you!
As to the reason for why I'm looking for this info, I've some AVX-512 code that experiences less than 3% L1D cache misses with O0 or no optimization flag, but more than 37% (although it speeds up the code) with O1 and beyond. If I can figure it which (hidden) flag is causing it, I might be able to speed up the code even further. There are too many flags in the common.opt file in the gcc source code, so I've hit a wall.