5

I'm trying to determine how AsParallel() splits it's 'source', and indeed what is meant by 'source'...

For example...

public class CSVItem
{
    public DateTime Date { get; set; }
    public string AccountNumber { get; set; }
}

List<CSVItem> CSVItemList = new List<CSVItem>();

Then put 500k varying CSVItem's into CSVItemList.

Then use:

CSVItemList = CSVItemList.AsParallel().OrderBy(x => x.AccountNumber).ThenBy(q => q.Date).ToList();

Will it only split the 'source' (meaning for example 250k records onto each of two threads) onto multiple asynch threads and perform the OrderBy().ThenBy() on each thread then merge the results...

Or will it separate the OrderBy() and ThenBy() onto separate threads and run them and then merge the results... giving a strangely ordered list?

Paul Zahra
  • 9,522
  • 8
  • 54
  • 76
  • [this](http://download.microsoft.com/download/B/C/F/BCFD4868-1354-45E3-B71B-B851CD78733D/WhenToUseParallelForEachOrPLINQ.pdf) article might hold the answers to your question (bottom of page 5). I found it in the [answer](http://stackoverflow.com/questions/3789998/parallel-foreach-vs-foreachienumerablet-asparallel) of another question – Nick Otten Jul 10 '15 at 12:47

2 Answers2

4

It gose one by one a) done with OrderBy merge result and than gose for b) ThenBy. Below image form Albahari blog shows how it works i.e. it take one by one

enter image description here

Q: how many number of task

A : you can decide this by using WithDegreeOfParallelism forces PLINQ to run the specified number of tasks simultaneously

   //create 5 task
   List.AsParallel().WithDegreeOfParallelism(5)

Check this : Parallel Programming

Pranay Rana
  • 175,020
  • 35
  • 237
  • 263
  • Are you sure? Generally speaking MSDN implies it's a bit more hazy than that... "The query partitions the source into tasks that are executed asynchronously on multiple threads. The order in which each task completes depends not only on the amount of work involved to process the elements in the partition, but also on external factors such as how the operating system schedules each thread." See https://msdn.microsoft.com/en-us/library/dd460714%28v=vs.110%29.aspx – Paul Zahra Jul 10 '15 at 12:39
  • 2
    @PaulZahra - i use this as ref http://www.albahari.com/threading/part5.aspx when it comes to parallel linq – Pranay Rana Jul 10 '15 at 12:40
2

I created a little example to check, which one is true.

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
    static void Main(string[] args)
    {
        List<TestItem> items = new List<TestItem>();
        List<TestItem> itemsNonParallel = new List<TestItem>();

        items.Add(new TestItem() { Age = 1, Size = 12 });
        items.Add(new TestItem() { Age = 2, Size = 1 });
        items.Add(new TestItem() { Age = 5, Size = 155 });
        items.Add(new TestItem() { Age = 23, Size = 42 });
        items.Add(new TestItem() { Age = 7, Size = 32 });
        items.Add(new TestItem() { Age = 9, Size = 22 });
        items.Add(new TestItem() { Age = 34, Size = 11 });
        items.Add(new TestItem() { Age = 56, Size = 142 });
        items.Add(new TestItem() { Age = 300, Size = 13 });

        itemsNonParallel.Add(new TestItem() { Age = 1, Size = 12 });
        itemsNonParallel.Add(new TestItem() { Age = 2, Size = 1 });
        itemsNonParallel.Add(new TestItem() { Age = 5, Size = 155 });
        itemsNonParallel.Add(new TestItem() { Age = 23, Size = 42 });
        itemsNonParallel.Add(new TestItem() { Age = 7, Size = 32 });
        itemsNonParallel.Add(new TestItem() { Age = 9, Size = 22 });
        itemsNonParallel.Add(new TestItem() { Age = 34, Size = 11 });
        itemsNonParallel.Add(new TestItem() { Age = 56, Size = 142 });
        itemsNonParallel.Add(new TestItem() { Age = 300, Size = 13 });

        foreach (var item in items.AsParallel().OrderBy(x => x.Age).ThenBy(x => x.Size))
        {
            Console.WriteLine($"Age: {item.Age}     Size: {item.Size}");
        }

        Console.WriteLine("---------------------------");

        foreach (var item in itemsNonParallel.OrderBy(x => x.Age).ThenBy(x => x.Size))
        {
            Console.WriteLine($"Age: {item.Age}     Size: {item.Size}");
        }

        Console.ReadLine();        
    }
}

public class TestItem
{
    public int Age { get; set; }
    public int Size { get; set; }
}

Result

AsParallel() does what we want. It first processes the OrderBy() parallel, merges back the list and then moves on to the next query, in our case ThenBy().

greenhoorn
  • 1,601
  • 2
  • 15
  • 39
  • A list compare would be usefull.... not sure list1.Except(list2) is correct for this though... as we want to check the order and not the content... then we could run the code against humungous lists and compare easily. – Paul Zahra Jul 10 '15 at 12:45
  • Hmm see fiddle here... Sequence equal giving false... although lists look the same... https://dotnetfiddle.net/KLOY1W – Paul Zahra Jul 10 '15 at 12:59
  • 1
    @PaulZahra: `SequenceEqual` returns false because when two objects are checked for equality, by default it goes by whether they are referencing the same object. The easy fix is to store the same objects in both lists by replacing the code to add a bunch of TestItems to `itemsNonParallel` with `itemsNonParallel = items.ToList();`. – Risky Martin Jul 10 '15 at 20:49