As has been already said, you really need to measure optimisation within the context of typical use cases for your own applications, in typical target environments. I include timers in my own automated regression suite for this reason, and have found some quite unusual results as documented in a previous question FWIW, I'm finding VS2010 SP1 is creating code about 8% faster on average than VS2008 on my own application, with about 13% with whole program optimization. This is not spread evenly across use cases. I also tend to see significant variations between long test runs, which are not visible profiling much smaller test cases. I haven't carried out platform comparisons yet, e.g. are many gains platform or hardware specific.
I would imagine that many optimisers will be fine tuned to give best results against well known benchmark suites, which could imply in turn that these are not the best pieces of code against which to test the benefits of optimisation. (Speculation of course)