4

I have done quite a bit of research on this but I still can't seem to get it right. I have to generate a pdf file with 1000 pages (I use a library) and need to do this N times for different data. The data is independent of each other and I can totally generate this in parallel, which is what i'm trying to do. Ideally I would like to have this done in say 10 threads , each thread generating the pdf in memory and saving it at the end. Say it takes 15 minutes per pdf (with 1000 pages), if I do this sequentially, that would be 150 minutes for 10 pdf files vs say 30 minutes if I use 10 threads. I know people are not very fond of threading, but how can I speed it up otherwise?

I was looking at ThreadPool but then I see this new Task in 4.0. I read that I can force each task to run in a separate thread if I use TaskCreationOptions.LongRunning but that does not seem to work for me. I tried to use ThreadPool too but since each PDF is generated from a url and for some reason, the WebRequest.Create(url) method seems to not execute when called from a threadpool? But I guess I would rather make the new Task library work.

This is what I have now but it still seems to execute sequentially.

Task myTask= Task.Factory.StartNew(() =>
                {
                  //code for the task.
                  //get html content
                  //generate pdf file.
                }
                }, new CancellationToken(false), TaskCreationOptions.LongRunning, TaskScheduler.Default);

myTask.Wait();

What am I doing wrong here? if you have any suggestions, please let me know. I can't go above .net 4.0 at the moment.

Alex J
  • 1,547
  • 2
  • 26
  • 41
  • How many tasks do you create? This just starts a single new thread and then awaits the result – flup Jun 03 '13 at 22:32
  • Define... "That does not seem to work for me". What is your approximate timing going sequentially vs multi-threaded? – Kevin Jun 03 '13 at 22:32
  • 1
    May I suggest parallel.foreach ? – redtuna Jun 03 '13 at 22:43
  • @flup : say around 100. I can tell upfront and create a for loop.
    – Alex J Jun 04 '13 at 00:20
  • both sequential and what I have take the same amount of time. @redtuna - parallel.foreach uses the multiple cores , not multiple threads per a single cpu, if i'm not wrong. I need multiple threads, not one per core – Alex J Jun 04 '13 at 00:23
  • @AlexJ: parallel.foreach can indeed spawn more threads than you have cores. That said, if you know exactly how many threads you want, you should go ahead and use threads. – redtuna Jun 04 '13 at 01:44

2 Answers2

11

myTask.Wait() makes your control thread's execution halt until the task completes... You don't want to halt execution right after firing off one of these tasks.

What you want to do is create multiple tasks at once, start them, and then call Task.WaitAll(array) to wait for them ALL to complete instead of waiting for one at a time.

// Define your tasks and start them all
var task1 = Task.Factory.StartNew(() => { /*do something*/ });
var task2 = Task.Factory.StartNew(() => { /*do something*/ });
var task3 = Task.Factory.StartNew(() => { /*do something*/ });

// Wait for ALL tasks to finish
// Control will block here until all 3 finish in parallel
Task.WaitAll(new[] { task1, task2, task3 });
Haney
  • 32,775
  • 8
  • 59
  • 68
  • 1
    I was under the impression that you'll still only get a task executing per core doing it this way? – Jammer Jun 03 '13 at 23:27
  • Approximately one per core, yes. There's a heuristic that determines how to schedule/execute them. The OP could mark these Long Running if they so chose; I am merely giving an example. – Haney Jun 04 '13 at 00:03
  • 1
    I would just go ahead and use Threads directly in this situation. Threading isn't as scary in .NET is a lot of people feel! – Jammer Jun 04 '13 at 06:39
  • 1
    TPL is the new hotness, but I totally also prefer the "feel" of classic Threads. ;) – Haney Jun 04 '13 at 15:12
  • 2
    http://stackoverflow.com/questions/1774670/c-sharp-threadpool-vs-tasks says "Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code." http://msdn.microsoft.com/en-us/library/dd460717.aspx – Alex J Jun 04 '13 at 21:00
0

If you think you know how many threads you want, you should just go ahead and use threads. Just start all those threads, and then wait for all of them to complete.

redtuna
  • 4,586
  • 21
  • 35