performance issue with nested for loops

Question

I have a nested for loop that performs some calculations and the math has been simplified to a great extent, however I still have a performance issue that I'm not sure I can resolve. I don't believe it can be resolved due to the sheer number of times these for loops execute. Now I'm not familiar with using any analytical tools to help determine where the slow downs occur within these but I am fairly certain that it's just the number of times these loops execute.

I would greatly appreciate any help in helping trim this down and increase the performance of this code. I'm trying to stay away from a HPC or highly parallelized solution, but if that's the only way to make this truly effective then I'll look into going down that road.

Here's the code with X= 20,000 and N_zero= 45,420 (values pulled from actual tests):

Dictionary<decimal, int> n_alpha = new Dictionary<decimal, int>();
Random rand = new Random();
decimal r = 0m;
decimal check=0m;

for (int i = 0; i < X; i++)
{
    B = N_0 = N_1 = N0_ = N1_ = 0;
    for (int j = 0; j < N_zero; j++)
    {
        // need a random decimal value between 0 and 1
        r = (decimal)rand.Next() / int.MaxValue;
        if (r <= r1)
        {
            N0_ += 1;
            N_0 += 1;
        }
        else if (r1 < r && r <= r2)
        {
            B += 1;
            N0_ += 1;
            N_1 += 1;
        }
        else if (r2 < r && r <= r3)
        {
            B += 1;
            N_0 += 1;
            N1_ += 1;
        }
        else if (r > r3)
        {
            N1_ += 1;
            N_1 += 1;
        }

    }
    check = N_0 * N_1 * N0_ * N1_;
    if (check != 0)
    {
        decimal a = 1 - (B * N_zero) / ((N_0 *N1_) + (N0_ *  N_1 ));
        // technically only tracking 4 decimal points, so key should reflect this
        decimal key = Math.Round(a, 4);
        if (n_alpha.ContainsKey(key))
        {
            n_alpha[key] += 1;
        }
        else
        {
            n_alpha.Add(key, 1);
        }
    }
}

It's really hard to tell what you're trying to accomplish. If the loops are too slow, the likely solution to speed it up is to redesign the logic to accomplish both your functional goal and performance goal. Just seeing some numbers float around it's really hard to tell what this should output. — chris-crush-code, May 05 '17 at 17:29
I think you're looking for [CodeReview](https://codereview.stackexchange.com/). — Ousmane D., May 05 '17 at 17:29
If precision is not needed you could change the decimal types to float and you don't mind some rounding errors. — Gilles, May 05 '17 at 17:30
I have tested it, if you remove the Random generation it halfs the execution time, may be you can use a faster random algorithm. — Gusman, May 05 '17 at 17:34
Dictionary can also be surprisingly heavy when used in tight loops. — Alex Paven, May 05 '17 at 17:36
@Gilles `decimal` is not immune to rounding errors. It just changes what kind of rounding errors occur. There's not enough context in this case to know for sure, but I agree that using `decimal` is likely inappropriate in this case (and would probably be a fairly significant cause of the poor performance). — Kyle, May 05 '17 at 17:40
There are a great many oddities in this code which are suspicious. Rather than pointing out all of them at length, **learn to use a profiler**. Performance problems can only be consistently solved with *empirical measurements*. Set a goal, measure performance, compare it to your goal, find the slowest thing, try to fix it, and repeat until you're done. **Strangers on the internet cannot reliably tell you where the code is slow**. — Eric Lippert, May 05 '17 at 17:55
That said, the most obvious thing to speed up this code is to use doubles, not decimals. Can you say why you chose decimal over double? Are these financial computations? — Eric Lippert, May 05 '17 at 17:57
As an aside, I reimplemented this code as given using doubles (or ints where appropriate) and the version using doubles executes almost 14 times faster (it finishes in ~19 seconds compared to almost 4 and a half minutes for the code given). It also resulted in the same output given the same random seed. — Kyle, May 05 '17 at 18:03
You said you are trying to avoid parallelization, but it would be simple to change the outer for loop to a Parallel.For(). If you do, be sure to put a lock() around the final if...else. — Dave Mackersie, May 05 '17 at 18:04
You can simplify your `if`/`else` logic. If `r` is not `<= r1` then the second `if` could make the extreme leap of `r1 < r` without explicitly checking. And so on through the `if`s. Such is the miracle of `else`. — HABO, May 05 '17 at 18:21
Assuming that `key` is between 0 and 1 (to 4 decimal places) it may be more efficient to use an array instead of a dictionary. — HABO, May 05 '17 at 18:27
@HABO Based on my tests the keys in the dictionary can be negative (and their range doesn't seem to be well-defined). — Kyle, May 05 '17 at 18:46
@EricLippert these are statistical computations. so accuracy was foremost in mind when attempting this. also I'll look into using a profiler, thanks for the info. — shadonar, May 05 '17 at 19:10
@Kyle if doubles will reduce computation time by that much, then I guess doubles will be a major improvement. Thanks for the input. — shadonar, May 05 '17 at 19:10
Do not think of doubles as *inaccurate*. They are *extremely accurate*. They *have representation error for most quantities that are exact decimals*. Just as decimals have representation error for most quantities that are *not* exact decimals. **The fact that you are rounding to four places indicates to me that you don't actually care about accuracy that much**. — Eric Lippert, May 05 '17 at 19:55
The rule is to use decimals *when you must consistently represent exact decimal quantities*. That is: financial computations. Use doubles to represent physical or statistical quantities. If you think it is odd to be charged `$123.44999999999999999999999`, you should have used decimals instead of doubles. If you think that it is perfectly sensible for a computation to say that you are `123.449999999999999999999` centimeters tall and you're OK with rounding that to the nearest millimeter, then use doubles. — Eric Lippert, May 05 '17 at 19:56
This loops apprx. 900,000,000 times. How long does it take now, and how fast does it need to be? — RBarryYoung, May 08 '17 at 17:58

score 0 · Answer 1 · edited May 23 '17 at 10:31

The use of rand.Next() is a bit of a performance problem. If you can live with a home-brewed, quick and dirty Linear Congruential Generator you will be much better off.
The use of decimal is also a major performance problem. Do not use decimal if you can avoid it. Use double instead.
The use of Dictionary can also be a bit of a problem, especially if the distribution of your data is such that it causes lots of hash collisions. I do not know what the range of your numeric data is, nor how it is distributed in that range, but if you can replace the dictionary with an array, by all means, do it.
You may be able to completely eliminate your inner nested loop. Since rand.Next() supposedly yields numbers with a uniform distribution, and since r1, r2 and r3 are known in advance and do not change in the loops, you can simply calculate how many rs would be issued that are below r1, how many many rs would fall between r1 and r2, etc. without actually issuing any rs. So, just add the corresponding quantities to your N0, N_1, etc instead of adding 1 each time.

EDIT Clarification about #4:

So, judging by the way you issue r, it can have values from 0.0 to 1.0. (Inclusively or exclusively doesn't really matter.) So, I presume that r1, r2, and r3 are also between 0.0 and 1.0. Therefore, assuming a perfectly uniform distribution, your rs should be as follows:

r < r1 will occur N_Zero * r1 times
r1 < r < r2 will occur N_Zero * (r2 - r1) times
r2 < r < r3 will occur N_Zero * (r3 - r2) times
r3 < r will occur N_Zero * (1 - r3) times.

How would you pre-determine how many r's would be issued for each of the checks? I'm looking into utilizing an array instead of the dictionary, but it'll take me some time to figure out how to handle the array and the indexes. Also with my requirements, I don't think the Linear Congruential Generator will work here, but I will keep it in mind and will do some testing. — shadonar, May 08 '17 at 16:26

performance issue with nested for loops

1 Answers1