I am working on a project where we were asked to write a simple OpenMP code to parallelize a program that works with differential equations. We were also asked to test the performance of the code with and without compiler optimizations. I'm working with the Sun CC compiler, so for the optmized version I used the options
-xopenmp -fast
and for the non optimized
-xopenmp=noopt
Not surprisingly the running time with the compiler optimisation on was much lower than in the other case. What surprises me is that the scaling performances are much better on the non-optimised version. Here, by performance I mean the speed-up coefficient, that is the ratio of the running time of the program ran over M processors and the running time of the program ran on 1 processor.
It was hinted that this could depend on the fact that the optimised version is memory-bound, while the non optimised version is CPU-bound. I am not sure of how the "boundness" could influence the scaling capability of my code. Do you have any suggestion?