1

I'm coding a physics simulation and I'm now feeling the need for optimizing it. I'm thinking about improving one point: one of the methods of one of my class (which I call a billion times in several cases) defines everytime a probability distribution. Here is the code:

void myClass::myMethod(){ //called billions of times in several cases
uniform_real_distribution<> probd(0,1);
uniform_int_distribution<> probh(1,h-2);
uniform_int_distribution<> probv(1,v-2);
    //rest of the code
}

Could I pass the distribution as member of the class so that I won't have to define them everytime? And just initialize them in the constructor and redefine them when h and v change? Can it be a good optimizing progress? And last question, could it be something that is already corrected by the compiler (g++ in my case) when compiled with flag -O3 or -O2?

Thank you in advance!

Update: I coded it and timed both: the program turned actually a bit slower (a few percents) so I'm back at what I started with: creating the probability distributions at each loop

Liam
  • 593
  • 6
  • 15
  • Instead of trying to figure out whether this particular optimization is viable, run your simulation through a profiler, and focus on optimizing the parts that it is spending the most time on. You may find that the amount of time the program spends constructing the distribution is so negligible (compared to other functions) that your time is better spent elsewhere. – JBentley Oct 31 '13 at 14:42

3 Answers3

5

Answer A: I shouldn't think so, for a uniform distribution it's just going to copy the parameter values into place, maybe with a small amount of arithmetic, and that will be well optimized.

However, I believe distribution objects can have state. They can use part of the random data from a call to the generator, and are permitted save the rest of the randomness to use next time the distribution is used, in order to reduce the total number of calls to the generator. So when you destroy a distribution object you might be discarding some possibly-costly random data.

Answer B: stop guessing and test it.

Time your code, then add static to the definition of probd and time it again.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • Didn't downvote for Answer A. Upvoted for Answer B. – nhgrif Oct 31 '13 at 12:17
  • @Steve Jessop: Thank you for the suggestion. Strangely, adding the distributions as members to my class object has slowed down the program (by a few percents) but adding static to their definition inside the loop as improved it a bit. Do you know why? Should I still be cautious when using "static" as are there more possible memory leaks in general? – Liam Nov 01 '13 at 15:54
2
  1. Yes
  2. Yes
  3. Well, there may be some advantage, but AFAIK those objects aren't really heavyweight/expensive to construct. Also, with locals you may gain something in data locality and in assumptions the optimizer can make.
  4. I don't think they are automatically moved as class variables (especially if your class is POD - in that case I doubt the compiler will dare to modify its layout); most probably, instead, they are completely optimized away - only the code of the called methods - in particular operator() - may remain, referring directly to h and v. But this must be checked by looking at the generated assembly.

Incidentally, if you have a performance problem, besides optimizing obvious points (non-optimal algorithms used in inner loops, continuous memory allocations, removing useless copies of big objects, ...) you should really try to use a profiler to find the real "hot spots" in your code, and concentrate to optimize them instead of going randomly through all the code.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
2

uniform_real_distribution maintains a state of type param_type which is two double values (using default template parameters). The constructor assigns to these and is otherwise trivial, the destructor is trivial.

Therefore, constructing a temporary within your function has an overhead of storing 2 double values as compared to initializing 1 pointer (or reference) or going through an indirection via this. In theory, it might therefore be faster (though, what appears to be faster, or what would make sense to run faster isn't necessary any faster). Since it's not much work, it's certainly worth trying and timing whether there's a difference, even if it is a micro-optimization.

Some 3-4 extra cycles are normally neglegible, but since you're saying "billions of times" it may of course very well make a measurable difference. 3 cycles times one billion is 1 second on a 3GHz machine.

Of course, optimization without profiling is always somewhat... awkward. You might very well find that a different part in your code that's called billions of times saves a lot more cycles.

EDIT:
Since you're not going to modify it, and since the first distribution is initialized with literal values, you might actually make it a constant (such as a constexpr or namespace level static const). That should, regardless of the other two, allow the compiler to generate the most efficient code in any case for that one.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • Well, the 3-4 cycles may very well remain negligible even if the function is called trillions of times, you have to compare it to the rest of the body of the function. – Matteo Italia Oct 31 '13 at 14:01
  • @MatteoItalia: True, but without profiling and only looking at isolated pieces of code, this is tough. Without having a reliable measurement of where your cycles go, shaving off 3 cycles here and there is probably as good as you can get. – Damon Oct 31 '13 at 15:10
  • Of course, I was just pointing out that "3 cycles" is meaningful only when compared with the rest of the function, and, if he's doing anything slightly more complicated than a few arithmetic operations, those 3 cycles are going to be vastly within experimental error. – Matteo Italia Oct 31 '13 at 15:19