1

I am not sure if this belongs here or Programmers.SE.

I have a simple for loop with a few if statements inside. The for loop needs to go over 200,000 elements and it is very slow in doing this. It takes over 10 minutes to run the loop and I have to run this loop 900 times. The code is below. I've timed the inside body of the loop and it is rather fast. The slow loop is because of the number of elements and NOT the body of the loop. This suggests that I can't parallelize since the "work" itself is not CPU consuming, is that correct?

Anyone have ideas on how I can improve the execution time?

 //main interaction function for the day
    // assumption: if a human is bit, and becomes infected.. can not get BIT again
    public void Interaction()
    {
        // need to go through all mosquitos
        for (int i = 0; i < Cnts.MosqPopulation; i++)
        {
            var mosq = Mosquitos[i];

            // see if this mosquito will bite someone            
            if (mosq.bitedistribution[(int)mosq.age - 1] == 1)
            {

                // who will it bite? get a list of good humans to bite
                // they have to non-isolated and they shouldnt have been bit before
                var nonisohumans = Humans.FindAll(x => (x.health != HumanStatus.SymptomaticIsolated
                                                        && x.swap == HumanStatus.Null));

                // pick a random human from the pool of candidates to bite.
                var rhuman = nonisohumans[Utils.random.Next(nonisohumans.Count)];


                // if the human is susceptible and mosquito is symptomatic
                if (rhuman.health == HumanStatus.Susceptible && mosq.health == MosquitoStatus.Infectious)
                {
                    // see if the disease will transfer to him
                    if (Utils.random.NextDouble() < Cnts.Contgion_MtoH)
                    {
                        rhuman.swap = HumanStatus.Latent;
                    }
                }

                // if the human is (a)symptomatic and mosqutio is susceptible
                if ((rhuman.health == HumanStatus.Symptomatic || rhuman.health == HumanStatus.ASymptomatic) && mosq.health == MosquitoStatus.Susceptible)
                {
                    // see if the disease will transfer to the mosquito
                    double probabilityofswap;
                    if (rhuman.health == HumanStatus.Symptomatic)
                    {
                        probabilityofswap = Cnts.Contgion_HtoM;
                    }
                    else
                    {
                        //reduced transmission
                        probabilityofswap = Cnts.Contgion_HtoM * Cnts.ReductionFactor;
                    }

                    if (Utils.random.NextDouble() < probabilityofswap)
                    {
                        mosq.swap = MosquitoStatus.Latent;
                    }
                }
                // Console.WriteLine("Mosquito i:{0} will bite today", i);
                // Console.WriteLine("Its distribution was: {0}", string.Join(",", Mosquitos[i].bitedistribution));
            }

        }

    }
masfenix
  • 7,736
  • 11
  • 45
  • 60
  • http://stackoverflow.com/questions/2260220/c-sharp-findall-vs-where-speed , check FindAll performance – Mohammad Olfatmiri Jul 21 '16 at 17:45
  • 1
    Try to cache as many variables as you can. For example the loop can be optimized itself. You are checking for Cnts.MosqPopulation which is interpreted every iteration! Do a int popCount = Cnts.MosqPopulation; and use it in the statement like for (int i = 0; i < popCount; ++i). Note that ++i instead of i++ may increase loop perfomance. – Markus Zeller Jul 21 '16 at 17:46
  • @Oli replacing the `FindAll` with `Where` dosnt really help. Note that I have to run a .ToList() at the end of Where to keep my code working. – masfenix Jul 21 '16 at 17:54
  • 3
    To continue what Markus said, you should initialize nonisohumans outside of the loop. – Cameron Jul 21 '16 at 17:54
  • @MarkusZeller I further removed the `for` loop altogether. I then used `Parallel.ForEach(Mosquitos, options, x => x.BiteSomeone(x, Humans)` and then in the `BiteSomeone` function, I only have the `if` statement logic – masfenix Jul 21 '16 at 17:58
  • 1
    @masfenix "The slow loop is because of the number of elements and NOT the body of the loop" - Your assumption is wrong. It takes my crappy laptop less than a second to iterate 200K * 900 items, meaning your program spends 99.9% of it's time **working** rather than iterating (which is not surprising at all). – shay__ Jul 21 '16 at 18:13
  • A game about mosquitos? You better not release this. – Hatted Rooster Jul 21 '16 at 18:22
  • 1
    @GillBates Scientific research - zika virus :D – masfenix Jul 21 '16 at 18:22
  • 1
    **Use a profiler**. Stop guessing, find out where you are actually spending time, and focus your optimizations on that. Use good engineering practices. That said, the `FindAll` is almost certainly the problem. How many humans are there in the collection? – Eric Lippert Jul 21 '16 at 18:27

4 Answers4

2

First: use a profiler. Find out where you are really spending time. Roughly a third of the time that I do performance analysis I am completely wrong as to where the problem actually is. Don't guess; measure.

Second: let's assume that it is the FindAll that is your problem; that seems the likeliest culprit. The operation here is:

  • Given a list of humans, construct a new list of humans that match a predicate
  • Choose a random element from that list.

That's a heavyweight operation if the collection of humans is large.

Consider solving the problem by choosing better data structures; do not have a collection of humans at all. Have two (or more) collections of humans: a collection of non-isolated humans and a collection of isolated humans, say.

Let's make some assumptions:

  • You're only going to choose a random item from the non-isolated humans.
  • Non-isolated humans never become isolated humans.
  • Isolated humans can become non-isolated humans.

If these assumptions are true then I would make non-isolated humans a list, and isolated humans a set. When an isolated human becomes a non-isolated human it is removed from the set -- a cheap operation -- and added to the end of the list -- also a cheap operation.

Now you don't need to construct a new list every time you need a random non-isolated human. You already are maintaining that list.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
1

Building off Oli's comment, try changing:

var nonisohumans = Humans.FindAll(x => (x.health != humanStatus.SymptomaticIsolated && x.swap == HumanStatus.Null));

to:

var nonisohumans = Humans.Where(x => (x.health != humanStatus.SymptomaticIsolated && x.swap == HumanStatus.Null));

and remove it from the loop entirely so it only processes once. Then just run your conditional (x.health != humanStatus.SymptomaticIsolated && x.swap == HumanStatus.Null) check after you get the random value again to make sure you aren't selecting one you've already iterated through (if so, get a new random).

Looks like you should also change your last if statement to an else if so it doesn't get evaluated if the previous one evaluates to true (they look to be mutually exclusive):

else if ((rhuman.health == HumanStatus.Symptomatic || rhuman.health == HumanStatus.ASymptomatic) && mosq.health == MosquitoStatus.Susceptible)
Chris Searles
  • 2,322
  • 17
  • 32
  • 1
    using a `Where` obviously still iterates over everything, you just won't "see" it happening – harold Jul 21 '16 at 17:57
  • @harold but with `FindAll` you have to itterate over it twice, once to filter once (over the reduced set) to process. Where filters as it processes. – Scott Chamberlain Jul 21 '16 at 17:59
  • Oh he did it twice, I stopped reading honestly. I mean that first `Where`, which is pointless. – harold Jul 21 '16 at 18:01
  • However, thinking about it more, it won't work. Chris, please explain what the OP should replace `var rhuman = nonisohumans[Utils.random.Next(nonisohumans.Count)];` wtih, `.Count` is now a expesive operation and you cant do do a indexer on a `IEnumerable`, and don't say do `.ToList()` because that wll just turn it back in to what `FindAll` does so there is no point in switching it. – Scott Chamberlain Jul 21 '16 at 18:04
  • @ChrisSearles See my comment in the question post. I can also run `Parallel.Foreach()` and for each "mosquito" I can just run a function which has the logic inside the for loop. Note that I've already tried this but its the same speed, but maybe you can help – masfenix Jul 21 '16 at 18:09
  • @masfenix where is Cnts.MosqPopulation coming from? In memory or DB? – Chris Searles Jul 21 '16 at 18:11
  • @ChrisSearles its a `public static readonly` variable. But moving the `var isohumans` OUTSIDE the loop brought the time significantly. I do have a `ToList()` appended to it though. Secondly, how do I remove my `rhuman` from this list now – masfenix Jul 21 '16 at 18:12
  • You just need to validate your condition again or remove that element from the collection after you iterate over it. – Chris Searles Jul 21 '16 at 18:13
  • @ChrisSearles I don't know how to remove the random human `rhuman` from my list now, considering my humans don't have `id` or any other identifier – masfenix Jul 21 '16 at 18:13
  • @ChrisSearles Also, it still takes about 45 ms to run. I am trying to bring this down even further because these are scientific computations and I need to run a very large number of simulations – masfenix Jul 21 '16 at 18:14
  • I think that's where you'll need to look at a parallel loop now that you've eliminated the primary bottleneck. – Chris Searles Jul 21 '16 at 18:17
  • @ChrisSearles Thanks. Lastly, for the removal, after I get my random `rhuman`, can I just say `nonisohumans.Remove(rhuman)`.. will it remove the "correct" from the list? – masfenix Jul 21 '16 at 18:18
  • @ChrisSearles running `RemoveAt()` with the index of the human brings the time back to what it was before. Any way I can remove this human faster – masfenix Jul 21 '16 at 18:27
  • Thought that might be a lot slower, try just evaluating your x.health != HumanStatus.SymptomaticIsolated && x.swap == HumanStatus.Null condition again – Chris Searles Jul 21 '16 at 18:28
1

To begin, I would have tried to optimize the cycle. For example, to make the block from a cycle. It is, as far as I can see, does not depend on the cycle parameters. Place it before the cycle.

var nonisohumans = Humans.FindAll(x => (x.health != HumanStatus.SymptomaticIsolated
                                                    && x.swap == HumanStatus.Null));

    Parallel.For(0, Cnts.MosqPopulation, i => 
    {
        var mosq = Mosquitos[i];

        // see if this mosquito will bite someone            
        if (mosq.bitedistribution[(int)mosq.age - 1] == 1)
        {
            // pick a random human from the pool of candidates to bite.
            var rhuman = nonisohumans[Utils.random.Next(nonisohumans.Count)];
       ...
    });

How big collection of Humans? You're fully pass it on each iteration.

Other places in the loop should not cause problems.

And of course you can use "Parallel.For" method to parallelize the loop.

  • 1
    rhuman still has to be chosen every cycle. Simply initialise nonisohumans before the loop and then remove remove each rhuman from nonisohumans after it is chosen. – xcvd Jul 21 '16 at 17:58
0

You can get rid of the FindAll altogether. It seems a lot cheaper to randomly pick one from the entire collection until you get one that meets the condition. While you hit ones that do not meet the condition, you remove them from the collection. You would start with a copy of your full collection.

Martin Maat
  • 714
  • 4
  • 23