1

I am copying large amounts of files from one server to several other servers.

I have a list of copy operations which have an Order so The files to Server a are first in the list and the files to server b come at the end of the list.

I am using a AsParallel.AsOrdered() on the list to make sure that files are copied to the first server first then only when threads are free to the second server.

It Doesn't seem to be working as it will copy files to both servers at the same time, am I missing something, is there a better way of writing this code. I checked with large files to really see the order and it seems to not relect the list order

var dotheCopy = copy.ToArray().OrderBy(a => a.GroupId);

var copyList = new List<CopyOps>();

foreach (var x in dotheCopy)
{
    ... some validation code 
    copyList.Add(x);
}

copyList.AsParallel().AsOrdered().WithDegreeOfParallelism(5)
    .ForAll(x => this.DoTheCopy(x));
royhowie
  • 11,075
  • 14
  • 50
  • 67
MicroMan
  • 1,988
  • 4
  • 32
  • 58
  • It seems like your actual goal it to process your items in order, but in batches of 5, correct? – dcastro Nov 27 '14 at 19:38
  • What we need to do is first copy to the first server but as threads become available start copying top the second server, we have a max degree of parallelism of 5 – MicroMan Nov 27 '14 at 19:42

2 Answers2

4

AsOrdered doesn't guarantee execution order. It guarantees that the results will be ordered.

Enumerable.Range(0, 3).AsParallel().Select(x => x); // 3,1,2,

Enumerable.Range(0, 3).AsParallel().AsOrdered().Select(x => x); // 1,2,3

Since ForAll doesn't return anything, then combining it with AsOrdered is useless.

It seems like your actual goal it to process your items in order, but in batches of 5, correct? In other words, you need throttling. If so, you can use ActionBlock<T> from TPL Dataflow

var block = new ActionBlock<CopyOps>(DoTheCopy, 
        new ExecutionDataflowBlockOptions
         {
            MaxDegreeOfParallelism = 5
         });

var tasks = copyList.Select(item => block.SendAsync(item));

await Task.WhenAll(tasks);

Or, a simpler approach, if blocking the main thread is not an issue

Parallel.ForEach(copyList,
                 new ParallelOptions { MaxDegreeOfParallelism = 5 },
                 DoTheCopy);
dcastro
  • 66,540
  • 21
  • 145
  • 155
  • What we need to do is first copy to the first server but as threads become available start copying top the second server, we have a max degree of parallelism of 5 – MicroMan Nov 27 '14 at 19:40
  • I assume the switch over logic is somewhere inside the `DoTheCopy` method, and I honestly don't see how that's relevant to the problem. Why do items have to be processed in order? What if all items take 1 second to be sent, but this one file takes 10 minutes? Should the other 4 threads sit there waiting or keep pumping other files? – dcastro Nov 27 '14 at 19:45
  • @dcastro your statement "AsOrdered doesn't guarantee execution order. It guarantees that the results will be ordered." seems to imply that there can be worst case, where I can't start enumerating the results, unless all elements in the original sequence have been processed (In parallel ofcourse).. Is that right? Can you please provide some other reference to support this statement? – Vikas Gupta Nov 27 '14 at 19:46
  • @dcastro we need all files present on server a as a priority and when all files are being copid there and then threads become available i would like to start copying to server B, i hope this makes sense :). The destinations are built into the CopyOps class and are ordered with server a first then at the end of the list is server b. – MicroMan Nov 27 '14 at 19:50
  • Would this be a better way to achieve the goal? Parallel.For( 0, copyList.Count, new ParallelOptions { MaxDegreeOfParallelism = 5 }, index => this.DoTheCopy(copyList[index])); – MicroMan Nov 27 '14 at 19:52
  • @Jeepers Ah, that makes more sense now. Well, the code above meets those requirements, give it a try. – dcastro Nov 27 '14 at 19:54
  • @Jeepers That would work too, but you'd be blocking the main thread. If that's not a problem, then sure, use it. `Parallel.ForEach` would be better suited for your situation though, instead of `Parallel.For`. – dcastro Nov 27 '14 at 19:55
  • @VikasGupta Here's a [reference](http://stackoverflow.com/a/18028469/857807) (It's not from MSDN, but I'm sure you can find one). That seems like a logical conclusion. If the first item takes longer to be processed than all other items put together, then yes, worst case you'll have to wait for all items to be processed before being able to enumerate the results. – dcastro Nov 27 '14 at 20:23
2

Here is some documentation from MSDN -

Order Preservation in PLINQ

"ForAll<TSource> - Result when the source sequence is ordered - Executes nondeterministically in parallel"

So seems like ForAll doesn't honor the order.

Vikas Gupta
  • 4,455
  • 1
  • 20
  • 40
  • Thanks, do you have a suggestion on how to refactor the code then? – MicroMan Nov 27 '14 at 19:36
  • @Jeepers May I ask what is the specific reason for this order? If you must, could you do the copy itself in two batches (i.e. have two lists processed (in parallel) one after the other). Assuming lists have sufficient work to do, you'd loose only a little bit of parallelism, by splitting the original list into two. This is the simplest (minimum tweaks) solution I can think of to your original problem.... Last, but not the least, consider using Async pattern for doing IO operations, but that would be out scope to cover in detail in this question. – Vikas Gupta Nov 27 '14 at 19:53