11

Our scenario is a network scanner.

It connects to a set of hosts and scans them in parallel for a while using low priority background threads.

I want to be able to schedule lots of work but only have any given say ten or whatever number of hosts scanned in parallel. Even if I create my own threads, the many callbacks and other asynchronous goodness uses the ThreadPool and I end up running out of resources. I should look at MonoTorrent...

If I use THE ThreadPool, can I limit my application to some number that will leave enough for the rest of the application to Run smoothly?

Is there a threadpool that I can initialize to n long lived threads?

[Edit] No one seems to have noticed that I made some comments on some responses so I will add a couple things here.

  • Threads should be cancellable both gracefully and forcefully.
  • Threads should have low priority leaving the GUI responsive.
  • Threads are long running but in Order(minutes) and not Order(days).

Work for a given target host is basically:

  For each test
    Probe target (work is done mostly on the target end of an SSH connection)
    Compare probe result to expected result (work is done on engine machine)
  Prepare results for host

Can someone explain why using SmartThreadPool is marked wit ha negative usefulness?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
LogicMagic
  • 321
  • 3
  • 8
  • Jeff Sternal's answer seems appropriate, though instead of BackgroundWorker you might be better off creating your own `System.Threading.Thread` instances. However, you also mention that even within your own threads, you are running out of thread-pool resources because of asynchronous operations. Maybe you should explain why you are performing asynchronous operations within a thread that you've dedicated specifically to run those operations. – Michael Petito Apr 22 '10 at 13:35
  • After some more searching I came across another ThreadPool http://smartthreadpool.codeplex.com/ which might hit the spot. I use asynch stuff in the worker thread to handle Output and Errors as they happen. As the application runs, users are given feedback so they see how far along each task is. – LogicMagic Apr 22 '10 at 14:40
  • The application is written using GTK# and the MVC pattern. There is a GUI where when certain events fire, the GUI thread is marshalled and updates are made. The scanning is not simple pings, the application is an authenticated scanner that looks at the configuration of targets. A scan should take O(minutes) time to complete. per host. Some per host reports are done on the engine machine on the per host thread for obvious reasons. A process is created per connection which is probably costly, but compared to minutes it does not matter. – LogicMagic Apr 22 '10 at 14:55
  • just to be pedantic, C# doesn't have any thread scheduling at all. In fact, it doesn't have threads. .NET, of course, has both of these. – John Saunders Apr 26 '10 at 18:51

5 Answers5

8

In .NET 4 you have the integrated Task Parallel Library. When you create a new Task (the new thread abstraction) you can specify a Task to be long running. We have made good experiences with that (long being days rather than minutes or hours).

You can use it in .NET 2 as well but there it's actually an extension, check here.

In VS2010 the Debugging Parallel applications based on Tasks (not threads) has been radically improved. It's advised to use Tasks whenever possible rather than raw threads. Since it lets you handle parallelism in a more object oriented friendly way.

UPDATE
Tasks that are NOT specified as long running, are queued into the thread pool (or any other scheduler for that matter).
But if a task is specified to be long running, it just creates a standalone Thread, no thread pool is involved.

ntziolis
  • 10,091
  • 1
  • 34
  • 50
  • The purpose of the TPL is to add a sense able layer of abstraction on threading and scheduling different tasks. The beauty of the TPL is that you can specify a **scheduler**. The scheduler **CAN** in fact be the thread pool. But you can also just as well define your own. For more info check: http://bit.ly/aW4Lq4 and http://bit.ly/9VAkbf – ntziolis Apr 23 '10 at 12:22
6

The CLR ThreadPool isn't appropriate for executing long-running tasks: it's for performing short tasks where the cost of creating a thread would be nearly as high as executing the method itself. (Or at least a significant percentage of the time it takes to execute the method.) As you've seen, .NET itself consumes thread pool threads, you can't reserve a block of them for yourself lest you risk starving the runtime.

Scheduling, throttling, and cancelling work is a different matter. There's no other built-in .NET worker-queue thread pool, so you'll have roll your own (managing the threads or BackgroundWorkers yourself) or find a preexisting one (Ami Bar's SmartThreadPool looks promising, though I haven't used it myself).

Jeff Sternal
  • 47,787
  • 8
  • 93
  • 120
  • Background workers are thought for scheduling long running tasks while keeping the UI responsive. They are not the right choice for application logic threading however. – ntziolis Apr 22 '10 at 13:26
  • @ntziolis - I agree, though we don't have enough information about the application to totally rule it out: a 'network scanner' could mean [a GUI application](http://www.softperfect.com/products/networkscanner/). – Jeff Sternal Apr 22 '10 at 13:27
  • That is true, maybe the creator could add some additional information? – ntziolis Apr 22 '10 at 13:30
  • The reason for a pool is to schedule all the work and set a maximum for the total number of concurrent threads that should run at any given time. I also forgot to mention that the user should be able to abort the threads. – LogicMagic Apr 22 '10 at 15:00
1

In your particular case, the best option would not be either threads or the thread pool or Background worker, but the async programming model (BeginXXX, EndXXX) provided by the framework.

The advantages of using the asynchronous model is that the TcpIp stack uses callbacks whenever there is data to read and the callback is automatically run on a thread from the thread pool.

Using the asynchronous model, you can control the number of requests per time interval initiated and also if you want you can initiate all the requests from a lower priority thread while processing the requests on a normal priority thread which means the packets will stay as little as possible in the internal Tcp Queue of the networking stack.

Asynchronous Client Socket Example - MSDN

P.S. For multiple concurrent and long running jobs that don't do allot of computation but mostly wait on IO (network, disk, etc) the better option always is to use a callback mechanism and not threads.

Pop Catalin
  • 61,751
  • 23
  • 87
  • 115
1

I'd create your own thread manager. In the following simple example a Queue is used to hold waiting threads and a Dictionary is used to hold active threads, keyed by ManagedThreadId. When a thread finishes, it removes itself from the active dictionary and launches another thread via a callback.

You can change the max running thread limit from your UI, and you can pass extra info to the ThreadDone callback for monitoring performance, etc. If a thread fails for say, a network timeout, you can reinsert back into the queue. Add extra control methods to Supervisor for pausing, stopping, etc.

using System;
using System.Collections.Generic;
using System.Threading;

namespace ConsoleApplication1
{
    public delegate void CallbackDelegate(int idArg);

    class Program
    {
        static void Main(string[] args)
        {
            new Supervisor().Run();
            Console.WriteLine("Done");
            Console.ReadKey();
        }
    }

    class Supervisor
    {
        Queue<System.Threading.Thread> waitingThreads = new Queue<System.Threading.Thread>();
        Dictionary<int, System.Threading.Thread> activeThreads = new Dictionary<int, System.Threading.Thread>();
        int maxRunningThreads = 10;
        object locker = new object();
        volatile bool done;

        public void Run()
        {
            // queue up some threads
            for (int i = 0; i < 50; i++)
            {
                Thread newThread = new Thread(new Worker(ThreadDone).DoWork);
                newThread.IsBackground = true;
                waitingThreads.Enqueue(newThread);
            }
            LaunchWaitingThreads();
            while (!done) Thread.Sleep(200);
        }

        // keep starting waiting threads until we max out
        void LaunchWaitingThreads()
        {
            lock (locker)
            {
                while ((activeThreads.Count < maxRunningThreads) && (waitingThreads.Count > 0))
                {
                    Thread nextThread = waitingThreads.Dequeue();
                    activeThreads.Add(nextThread.ManagedThreadId, nextThread);
                    nextThread.Start();
                    Console.WriteLine("Thread " + nextThread.ManagedThreadId.ToString() + " launched");
                }
                done = (activeThreads.Count == 0) && (waitingThreads.Count == 0);
            }
        }

        // this is called by each thread when it's done
        void ThreadDone(int threadIdArg)
        {
            lock (locker)
            {
                // remove thread from active pool
                activeThreads.Remove(threadIdArg);
            }
            Console.WriteLine("Thread " + threadIdArg.ToString() + " finished");
            LaunchWaitingThreads(); // this could instead be put in the wait loop at the end of Run()
        }
    }

    class Worker
    {
        CallbackDelegate callback;
        public Worker(CallbackDelegate callbackArg)
        {
            callback = callbackArg;
        }

        public void DoWork()
        {
            System.Threading.Thread.Sleep(new Random().Next(100, 1000));
            callback(System.Threading.Thread.CurrentThread.ManagedThreadId);
        }
    }
}
Ed Power
  • 8,310
  • 3
  • 36
  • 42
-2

Use the built-in threadpool. It has good capabilities.

Alternatively you can look at the Smart Thread Pool implementation here or at Extended Thread Pool for a limit on the maximum number of working threads.

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
Kangkan
  • 15,267
  • 10
  • 70
  • 113