0

This is a two-part question.

I have programmatically determined a range of double values:

    public static void Main(string[] args)
    {
        var startRate = 0.0725;
        var rateStep = 0.001;
        var maxRate = 0.2;
        var stepsFromStartToMax = (int)Math.Ceiling((maxRate-startRate)/rateStep);

        var allRateSteps = Enumerable.Range(0, stepsFromStartToMax)
            .Select(i => startRate + (maxRate - startRate) * ((double)i / (stepsFromStartToMax - 1)))
            .ToArray();
        foreach (var i in allRateSteps)
        {
            Console.WriteLine(i); //  this prints the correct values
        }
    }

I would like to divide this list of numbers up into chunks based on the processor count, which I can get from Environment.ProcessorCount (usually 8.) Ideally, I would end up with something like a List of Tuples, where each Tuple contains the start and end values for each chunk:

[(0.725, 0.813), (0.815, 0.955), ...]

1) How do you select out the inner ranges in less code, without having to know how many tuples I will need? I've come up with a long way to do this with loops, but I'm hoping LINQ can help here:

        var counter = 0;
        var listOne = new List<double>();
        //...
        var listEight = new List<double>();
        foreach (var i in allRateSteps)
        {
            counter++;
            if (counter < allRateSteps.Length/8)
            {
                listOne.Add(i);
            }
            //...
            else if (counter < allRateSteps.Length/1)
            {
                listEight.Add(i);
            }
        }
        // Now that I have lists, I can get their First() and Last() to create tuples
        var tupleList = new List<Tuple<double, double>>{
            new Tuple<double, double>(listOne.First(), listOne.Last()),
            //...
            new Tuple<double, double>(listEight.First(), listEight.Last())
        };

Once I have this new list of range Tuples, I want to use each of these as a basis for a parallel loop which writes to a ConcurrentDictionary during certain conditions. I'm not sure how to get this code into my loop...

I've got this piece of code working on multiple threads, but 2) how do I evenly distribute the work across all processors based on the ranges I've defined in tupleList:

        var maxRateObj = new ConcurrentDictionary<string, double>();
        var startTime = DateTime.Now;
        Parallel.For(0,
                     stepsFromStartToMax,
                     new ParallelOptions
                     {
                         MaxDegreeOfParallelism = Environment.ProcessorCount
                     },
                     x =>
                     {
                        var i = (x * rateStep) + startRate;
                        Console.WriteLine("{0} : {1} : {2} ",
                                      i,
                                      DateTime.Now - startTime,
                                      Thread.CurrentThread.ManagedThreadId);
                         if (!maxRateObj.Any())
                         {
                             maxRateObj["highestRateSoFar"] = i;
                         }
                         else {
                             if (i > maxRateObj["highestRateSoFar"])
                             {
                                maxRateObj["highestRateSoFar"] = i;
                             }
                         }
                     });

This prints out, e.g.:

...
0.1295 : 00:00:00.4846470 : 5 
0.0825 : 00:00:00.4846720 : 8 
0.1645 : 00:00:00.4844220 : 6 
0.0835 : 00:00:00.4847510 : 8 
...

Thread1 needs to handle the ranges in the first tuple, thread2 handles the ranged defined in the second tuple, etc... where i is defined by the range in the loop. Again, the number of range tuples will depend on the number of processors. Thanks.

JacobIRR
  • 8,545
  • 8
  • 39
  • 68
  • There are many existing Q&As on the site for batching w/LINQ, but creating batches to manually schedule across cores is likely unecessary and less performant than letting a [Task Scheduler](https://msdn.microsoft.com/en-us/library/dd997402(v=vs.100).aspx) handle it for you. Why not just create your own `Thread`s and have each process its own list of inputs? – Taylor Wood Feb 18 '18 at 01:32
  • @TaylorWood - can you create a variable number of threads dynamically in this context? – JacobIRR Feb 18 '18 at 01:33
  • Yes, you could create a dynamically-sized collection of threads, but why not use Tasks on a thread pool? – Taylor Wood Feb 18 '18 at 01:35
  • This part is confusing: "How do you select out the inner ranges in less code, without having to know how many tuples I will need?" From the rest of the question it sounds like you're dealing with known quantities and inputs. Why don't/can't you know how many tuples you will need? – Scott Hannen Feb 18 '18 at 01:36
  • @ScottHannen - I want to use LINQ to get the batches in `n` parts where `n=Environment.ProcessorCount`. I'm hardcoding the assumption of 8 processors in my code because I have no way to show the optimized way (which I do not yet know) – JacobIRR Feb 18 '18 at 01:38
  • @TaylorWood - would that method just involve creating a new Task inside a for loop whose range is set by the processor count? – JacobIRR Feb 18 '18 at 01:55
  • I think it would help a lot if you described at a high level what you're ultimately trying to accomplish. What's the point of `maxRateObj`? Your `Parallel.For` code doesn't reference any tuples/batches created above. – Taylor Wood Feb 18 '18 at 02:36
  • I have a loop that currently takes > 30 seconds to complete, so I want to spread it across many threads to speed it up. I have to manually run a calculation for every single `double` in the the `allRateSteps` list that I create. Each time the loop runs, if the rate meets certain conditions for a calculation (omitted here for brevity), the maxRateObj will get this value assigned to it for later use. The second part of my question is asking how to integrate the tuples into the loop. – JacobIRR Feb 18 '18 at 02:43

1 Answers1

1

I would like to divide this list of numbers up into chunks based on the processor count

There are many possible implementations for a LINQ Batch method.

How do you select out the inner ranges in less code, without having to know how many tuples I will need?

Here's one way to handle that:

var batchRanges = from batch in allRateSteps.Batch(anyNumberGoesHere)
                  let first = batch.First()
                  let last = batch.Last()
                  select Tuple.Create(first, last);

(0.0725, 0.0795275590551181)
(0.0805314960629921, 0.0875590551181102)
(0.0885629921259842, 0.0955905511811024)
...

how do I evenly distribute the work across all processors based on the ranges I've defined in tupleList

This part of your example doesn't reference tupleList so it's hard to see the desired behavior.

Thread1 needs to handle the ranges in the first tuple, thread2 handles the ranged defined in the second tuple, etc...

Unless you have some hard requirement that certain threads process certain batches, I would strongly suggest generating your work as a single "stream" and using a higher-level abstraction for parallelism e.g. PLINQ.

If you just want to do work in batches, you can still do that but not care about which thread(s) the work is being done on:

static void Work(IEnumerable<int> ints) {
  var sum = ints.Sum();
  Thread.Sleep(sum);
  Console.WriteLine(ints.Sum());
}

public static void Main (string[] args) {
  var inputs = from i in Enumerable.Range(0, 100)
               select i + i;
  var batches = inputs.Batch(8);
  var tasks = from batch in batches
              select Task.Run(() => Work(batch));
  Task.WaitAll(tasks.ToArray());
}

The default TaskScheduler is coordinating the work for you behind the scenes, and it'll likely outperform hand-rolling your own threading scheme.

Also consider something like this:

static int Work(IEnumerable<int> ints) {
  Console.WriteLine("Work on thread " + Thread.CurrentThread.ManagedThreadId);
  var sum = ints.Sum();
  Thread.Sleep(sum);
  return sum;
}

public static void Main (string[] args) {
  var inputs = from i in Enumerable.Range(0, 100)
               select i + i;
  var batches = inputs.Batch(8);
  var tasks = from batch in batches
              select Work(batch);
  foreach (var task in tasks.AsParallel()) {
    Console.WriteLine(task);
  }
}

/*
Work on thread 6
Work on thread 4
56
Work on thread 4
184
Work on thread 4
Work on thread 4
312
440
...
*/
Taylor Wood
  • 15,886
  • 1
  • 20
  • 37
  • Thanks for this great answer. I'm just realizing that I don't actually need to 'divide the work to be done' into some kind of data structure like `listTuple` and then supply that to the parallel function, because 'dividing the work to be done' and distributing it across threads is the very purpose of using something like this. – JacobIRR Feb 19 '18 at 23:43