4

Here's the list of tasks that I need to complete:

  1. Read a chunk of the file (Disk IO Bound)
  2. Encrypt said chunk (CPU Bound)
  3. Upload said chunk (Network IO Bound)
  4. Repeat until file is uploaded

The problem lies in how to accomplish this with maximum efficiency and performance.

I've tried using Parallel.For to encapsulate the entire operation block, but I don't think this is the best way about approaching this problem considering each operation has different characteristics about it that can be accounted for (as I point out in the list above).

After reading this TPL article suggested in this question, and after reviewing the empirical data in that question, I think TPL is the way I want to go. But how should I break this up for maximum efficiency and performance? Should I even bother trying to multi-thread the first two operations considering the upload is likely to be the bottleneck of the whole operation?

Thanks for your input.

Edit:

I've tried using Tasks and ContinueWith to let the OS deal with it, but I think I'm hitting another wall -- when I wait for all of my Upload tasks to complete, it seems like the garbage collector isn't cleaning up the data that I read in to upload and as such I end up running out of memory. Yet another bound to consider.

Community
  • 1
  • 1
Brian D
  • 9,863
  • 18
  • 61
  • 96
  • Can whatever is at the other end of your upload handle it if the chunks come out of order? If it can't there's not much to be gained with splitting it up. – Mike Parkhill Dec 20 '12 at 00:45
  • I'm just uploading to a blob in Azure, and I can safely write chunks in any order to it given the chunk offset. – Brian D Dec 20 '12 at 00:56
  • Yes, 4.5 is available to me. – Brian D Dec 20 '12 at 20:16
  • I like this question. Can we see some psuedo code for what you have right now? I feel like you are mostly bound by how fast your disk/IO is. You might just rip from your disk normally, and hand it to a encrypter class that is running multi-threaded. Then you can hand that completed task to your uploader, possibly also running in parallel. I am not sure you would gain a ton of benefit from that though. – Ryan Bennett Dec 20 '12 at 21:33
  • Either way, encapsulating each of these tasks will make it easy for you to experiment – Ryan Bennett Dec 20 '12 at 21:34

1 Answers1

1

If you couldn't use .Net 4.5, I would suggest you to use one thread for reading from the disk, one thread for encrypting and one thread for uploading. To communicate between them, you would use producer-consumer pattern in the form of BlockingCollection<byte[]> between each pair of threads (1-2 and 2-3).

But since you can use .Net 4.5, you can use TPL Dataflow, which is is perfect fit for this task. Using TPL Dataflow means you won't waste threads for reading and uploading (though that most likely won't matter much to you). More importantly, it means you can easily parallelize encryption of each chunk (assuming you can do that).

What you would do is to have one block for encryption, one block for uploading and one asynchronous task (actually, it doesn't have to be a full Task) for reading from the file. The block for encryption could be configured to execute in parallel, and both blocks should be configured with some maximum capacity (otherwise, throttling wouldn't work correctly and the whole file would be read as fast as possible, which could lead to OutOfMemoryException).

In code:

var uploadBlock = new ActionBlock<byte[]>(
    data => uploadStream.WriteAsync(data, 0, data.Length),
    new ExecutionDataflowBlockOptions { BoundedCapacity = capacity });

var encryptBlock = new TransformBlock<byte[], byte[]>(
    data => Encrypt(data),
    new ExecutionDataflowBlockOptions
    {
        BoundedCapacity = capacity,
        MaxDegreeOfParallelism = degreeOfParallelism
    });

encryptBlock.LinkTo(
    uploadBlock,
    new DataflowLinkOptions { PropagateCompletion = true });

while (true)
{
    byte[] chunk = new byte[chunkSize];
    int read = await fileStream.ReadAsync(chunk, 0, chunk.Length);
    if (read == 0)
        break;
    await encryptBlock.SendAsync(chunk);
}

fileStream.Close();
encryptBlock.Complete();
await uploadBlock.Completion;
uploadStream.Close();
svick
  • 236,525
  • 50
  • 385
  • 514