0

I'm new to threading so have some patience please.

I have tens of thousands of rows in a database. Each row represents a job needed to be done over the internet. I read a data row, I do some network-related work (which can even take between a couple of seconds up to a couple of minutes) and I grab the next data row (my C# application uses console, not GUI). As you might expect I want to do these jobs concurrently.

I looked into this subject and I thought I would use BackgroundThreads, but if I understand correctly people suggest there is no point in using them in a console application.

I assume I should not use Tasks, because each of my "tasks" will be represented by a single thread.

So I thought I would use ThreadPool with regular Threads.

To make things simple I just want to keep a constant number of threads (spawn new ones when one finishes) untill I run out of things to do (then I wait for data - usually alot of it - to arrive in the database and spawn threads). I need to know when a Thread ends because I have to spawn a new thread and update the database row containing data it was working with. To keep threads and database in sync I would probably have to mark database row with some kind of thread id when it is retrieved and then mark the row (success/fail) when thread ends. Is this solution (try catch in thread delegate) enough to be sure that a thread has ended (and if it succeded or threw exception)?

I am not sure how to "wait" for the first thread to end - not all and not a particular one.

I also think that I don't want to read too much data in advance (and potentially wait for a thread to free up) because there might be other programs doing the same thing using the same database.

Any ideas appreciated!

Community
  • 1
  • 1
user1713059
  • 1,425
  • 2
  • 17
  • 34
  • 1
    Since your bottleneck will be network rather than CPU, you should not use tasks. This might be a good place to use the async/await features of .NET 4.5 – mao47 Nov 13 '13 at 20:09

1 Answers1

4

Just use Parallel.ForEach to do this:

Parallel.ForEach(rows, row => ProcessRow(row));

If you need to specify a max degree of parallelization because the automatic partitioner happens to be using too many thread pool threads then you can specify it like so:

Parallel.ForEach(rows, new ParallelOptions() { MaxDegreeOfParallelism = 5 }
    , row => ProcessRow(row));
Servy
  • 202,030
  • 26
  • 332
  • 449
  • I am not sure how would I continously feed it with data. You suggest I somehow buffer the data? Read a portion, execute Parellel.ForEach, wait for it to finish, read another portion? It needs to chew at max capacity the whole time and I cannot read whole database at once. – user1713059 Nov 13 '13 at 20:33
  • @user1713059 If you pass in an `IEnumerable` that loads the rows lazily you don't have any problems with that. If you don't have a method of streaming your rows then you can batch them instead if you want. – Servy Nov 13 '13 at 20:34