7

I have an component that is processing multiple web requests each in separate thread. Each WebRequest processing is synchronous.

public class WebRequestProcessor:System.ComponentModel.Component
{
    List<Worker> tlist = new List<Worker>();
    public void Start()
    {
        foreach(string url in urlList){
            // Create the thread object. This does not start the thread.
            Worker workerObject = new Worker();
            Thread workerThread = new Thread(workerObject.DoWork);

            // Start the worker thread.
            workerThread.Start(url);
            tlist.Add(workerThread);
        }
    }
}

public class Worker
{
    // This method will be called when the thread is started.
    public void DoWork(string url)
    {
        // prepare the web page we will be asking for
        HttpWebRequest  request  = (HttpWebRequest) 
            WebRequest.Create(url);

        // execute the request
        HttpWebResponse response = (HttpWebResponse)
            request.GetResponse();

        // we will read data via the response stream
        Stream resStream = response.GetResponseStream();

        // process stream
    }
}

Now I have to find optimal way how to cancel all requests.

One way is to convert each synchronous WebRequest into async and use WebRequest.Abort to cancel processing.

Another way is to release thread pointers and allow all threads to die using GC.

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
walter
  • 843
  • 2
  • 14
  • 35
  • “allow all threads to die using GC”. That's not how threads behave. Even if there is no reference to the `Thread` you created, the thread is still running. – svick Jul 16 '11 at 23:32
  • yes they will die after completed processing, in my case is up to 20 sec – walter Jul 16 '11 at 23:51
  • my question is which way is better, or is there are any other alternatives – walter Jul 16 '11 at 23:53

2 Answers2

10

If you want to download 1000 files, starting 1000 threads at once is certainly not the best option. Not only it probably won't get you any speedup when compared with downloading just a few files at a time, it will also require at least 1 GB of virtual memory. Creating threads is expensive, try to avoid doing so in a loop.

What you should do instead is to use Parallel.ForEach() along with the asynchronous versions of the request and response operations. For example like this (WPF code):

private void Start_Click(object sender, RoutedEventArgs e)
{
    m_tokenSource = new CancellationTokenSource();
    var urls = …;
    Task.Factory.StartNew(() => Start(urls, m_tokenSource.Token), m_tokenSource.Token);
}

private void Cancel_Click(object sender, RoutedEventArgs e)
{
    m_tokenSource.Cancel();
}

void Start(IEnumerable<string> urlList, CancellationToken token)
{
    Parallel.ForEach(urlList, new ParallelOptions { CancellationToken = token },
                     url => DownloadOne(url, token));

}

void DownloadOne(string url, CancellationToken token)
{
    ReportStart(url);

    try
    {
        var request = WebRequest.Create(url);

        var asyncResult = request.BeginGetResponse(null, null);

        WaitHandle.WaitAny(new[] { asyncResult.AsyncWaitHandle, token.WaitHandle });

        if (token.IsCancellationRequested)
        {
            request.Abort();
            return;
        }

        var response = request.EndGetResponse(asyncResult);

        using (var stream = response.GetResponseStream())
        {
            byte[] bytes = new byte[4096];

            while (true)
            {
                asyncResult = stream.BeginRead(bytes, 0, bytes.Length, null, null);

                WaitHandle.WaitAny(new[] { asyncResult.AsyncWaitHandle,
                                           token.WaitHandle });

                if (token.IsCancellationRequested)
                    break;

                var read = stream.EndRead(asyncResult);

                if (read == 0)
                    break;

                // do something with the downloaded bytes
            }
        }

        response.Close();
    }
    finally
    {
        ReportFinish(url);
    }
}

This way, when you cancel the operation, all downloads are canceled and no new ones are started. Also, you probably want to set MaxDegreeOfParallelism of ParallelOptions, so that you aren't doing too many downloads at once.

I'm not sure what do you want to do with the files you are downloading, so using StreamReader might be a better option.

svick
  • 236,525
  • 50
  • 385
  • 514
  • i don't see in you sample thread abort or leave to die way of processing, correct me if i am wrong; it looks like your point that converting sync webrequest into async is better approach in this scenario; i have checked .net 4 code and found a few samples of cancelling webrequests and nothing about to leave thread to die by itself so most likely will go that route; thanks – walter Jul 17 '11 at 04:18
  • @walter, yes, I think it is better this way. For one, why would you want to “cancel” a download, that actually keeps current downloads running? – svick Jul 17 '11 at 11:55
  • Note that my answer blocks the thread that does the downloading. This is not ideal and I now think it should be rewritten, especially if you can use `async` from C# 5. – svick Mar 02 '12 at 15:20
2

I think the best solution is "Parallel Foreach Cancellation". Please check the following code.

  1. To implement a cancellation, you first make CancellationTokenSource and pass it to Parallel.ForEach through option.
  2. If you want to cancel, you can call CancellationTokenSource.Cancel()
  3. After the cancelling, OperationCanceledException will be occurred, which you need to handle.

There is a good article about Parallel Programming related to my answer, which is Task Parallel Library By Sacha Barber on CodeProject.

CancellationTokenSource tokenSource = new CancellationTokenSource();
ParallelOptions options = new ParallelOptions()
{
    CancellationToken = tokenSource.Token
};

List<string> urlList = null;
//parallel foreach cancellation
try
{
    ParallelLoopResult result = Parallel.ForEach(urlList, options, (url) =>
    {
        // Create the thread object. This does not start the thread.
        Worker workerObject = new Worker();
        workerObject.DoWork(url);
    });
}
catch (OperationCanceledException ex)
{
    Console.WriteLine("Operation Cancelled");
}

UPDATED

The following code is "Parallel Foreach Cancellation Sample Code".

class Program
{
    static void Main(string[] args)
    {
        List<int> data = ParallelEnumerable.Range(1, 10000).ToList();

        CancellationTokenSource tokenSource = new CancellationTokenSource();

        Task cancelTask = Task.Factory.StartNew(() =>
            {
                Thread.Sleep(1000);
                tokenSource.Cancel();
            });


        ParallelOptions options = new ParallelOptions()
        {
            CancellationToken = tokenSource.Token
        };


        //parallel foreach cancellation
        try
        {
            Parallel.ForEach(data,options, (x, state) =>
            {
                Console.WriteLine(x);
                Thread.Sleep(100);
            });
        }
        catch (OperationCanceledException ex)
        {
            Console.WriteLine("Operation Cancelled");
        }


        Console.ReadLine();
    }
}
Jin-Wook Chung
  • 4,196
  • 1
  • 26
  • 45
  • That's not how cancellation in TPL works. And the article you linked to explains that. If your task is supposed to support cancellation, you have to manually check whether it is canceled. The `OperationCanceledException` is not thrown automagically (only `ThreadAbortException` does that). – svick Jul 16 '11 at 23:39
  • @svick: No, it's not. If a user call `CancellationTokenSource.Cancel()`, it'll be immediately canceled after ending the step at that time . – Jin-Wook Chung Jul 16 '11 at 23:52
  • I see. `Task Cancellation` is as you mentioned, but cancellations of Parallel Loop and PLINQ are different. if Parallel Loop and PLINQ are canceled, OperationCanceledException will be occurred. – Jin-Wook Chung Jul 16 '11 at 23:57
  • If user calls `CancellationTokenSource.Cancel()`, it just sets a property on the `CancellationToken`, that's all it does. You have to manually call `CancellationToken.ThrowIfCancellationRequested()` (or check `CancellationToken.IsCancellationRequested`). – svick Jul 16 '11 at 23:58
  • Ah, I think I see what you mean. Calling `Cancel()` doesn't stop the tasks that are currently executing, but it stops new tasks from being started. Is this what you meant? If so, I think you should make that clear in your answer. – svick Jul 17 '11 at 00:03
  • @svick: Exactly when a operation is cancelled may be important for some cases, but your former point is not true. – Jin-Wook Chung Jul 17 '11 at 00:21
  • @jwJung is your point is to use TPL to manage thread canceling, am I right? Do you know if WebRequest is smart enough to catch thread cancelling. I have not used TPL in past and my project just using old style threads and processing, how those 2 approaches will coexists? – walter Jul 17 '11 at 00:32