F# task parallelism under Mono doesn't "appear" to execute in parallel

Question

I have the following dummy code to test out TPL in F#. (Mono 4.5, Xamarin studio, quad core MacBook Pro)

To my surprise, all the processes are done on the same thread. There is no parallelism at all.

open System
open System.Threading
open System.Threading.Tasks


let doWork (num:int) (taskId:int) : unit =
    for i in 1 .. num do
        Thread.Sleep(10)
        for j in 1 .. 1000 do
            ()
        Console.WriteLine(String.Format("Task {0} loop: {1}, thread id {2}", taskId, i, Thread.CurrentThread.ManagedThreadId)) 

[<EntryPoint>]
let main argv = 

    let t2 = Task.Factory.StartNew(fun() -> doWork 10 2)
    //printfn "launched t2"
    Console.WriteLine("launched t2")
    let t1 = Task.Factory.StartNew(fun() -> doWork 8 1)
    Console.WriteLine("launched t1")
    let t3 = Task.Factory.StartNew(fun() -> doWork 10 3)
    Console.WriteLine("launched t3")
    let t4 = Task.Factory.StartNew(fun() -> doWork 5 4)
    Console.WriteLine("launched t4")
    Task.WaitAll(t1,t2,t3,t4)
    0 // return an integer exit code

However, if I increase the thread sleep time from 10 to 100ms, I can see a little parallelism.

What have I done wrong? What does this mean? I did consider the possibility of the CPU finished the work before TPL can start the task on a new thread. But this doesn't make sense to me. I can increase the inner dummy loop for j in 1 .. 1000 do () to loop 1000 more times. The result is the same: no parallelism (thread.sleep is set 10 ms).

The same code in C# on the other hand, produces the desired results: all tasks print the message to the window in a mixed order (rather than sequential order)

Update:

As suggested I changed the inner loop to do some 'actual' thing but the result is still execution on the single thread

Update 2:

I don't quite understand Luaan's comments but I just did a test on a friend's PC. And with the same code, parallelism is working (without thread sleep). It looks like something to do with Mono. But can Luaan explain what I should expect from TPL again? If I have tasks that I want to perform in parallel and taking advantage of the multicore CPU, isn't TPL the way to go?

Update 3:

I have tried out @FyodorSoikin's suggestion again with dummy code that won't be optimized away. Unfortunately, the workload still is not able to make Mono TPL to use multiple threads. Currently the only way I can get Mono TPL to allocate multiple threads is to force a sleep on the existing thread for more than 20ms. I am not qualified enough to asset that Mono is wrong, but I can confirm the same code (same benchmark workload) have the different behaviors under Mono and Windows.

I think you should use `Thread` instead of `Task`. Tasks' nature is more async then parallel. You can't be sure that they'll be executed in many threads. See this: http://stackoverflow.com/questions/13429129/task-vs-thread-differences — pizycki, Jun 08 '15 at 12:03
I am going to guess that free inner loop gets optimized away, because it doesn't do anything. Try putting something non-optimizable in there (say, a library function call) and see if that makes a difference. — Fyodor Soikin, Jun 08 '15 at 12:05
@FyodorSoikin Doesn't really matter. Looping a thousand times would hardly be measurable even if it didn't get optimized away. I only start seeing some execution time when using `Thread.SpinWait(1000000)`, about 5ms - 1000 is just way too low on modern CPUs. — Luaan, Jun 08 '15 at 12:08
And the jury is in: the code is indeed optimized away, just looked with ILSpy. @Luaan, 1000 empty cycles is indeed not noticeable, but if you put something non-trivial inside (see my answer), it will be fine. — Fyodor Soikin, Jun 08 '15 at 12:12
re @ FyodorSoikin, make the inner loop to do some 'actual' thing made no difference. i just replaced the inner loop to factorial 2000 but still same thread. the only thing i can think of to make them execute on different thread is to put the execution thread to sleep — casbby, Jun 08 '15 at 12:14
You seem to miss the point. The compiler is perfectly able to see that your `fact` function has no side effects, and you duly ignore its return value. Therefore, it is perfectly cool to optimize it away. You need to do something that the compiler doesn't know how to optimize. See my answer for example. — Fyodor Soikin, Jun 08 '15 at 12:35

Luaan · Accepted Answer · 2015-06-08T12:21:12.067

6

It looks like the Sleeps are ignored completely - see how the Task 2 loop is printed even before launching the next task, that's just silly - if the thread waited for 10ms, there's no way for that to happen.

I'd assume that the cause might be the timer resolution in the OS. The Sleep is far from accurate - it might very well be that Mono (or Mac OS) decides that since they can't reliably make you run again in 10ms, the best choice is to simply let you run right now. This is not how it works on Windows - there you're guaranteed to lose control as long as you don't Sleep(0); you'll always sleep at least as long as you wanted. It seems that on Mono / Mac OS, the idea is the reverse - the OS tries to let you sleep at most the amount of time you specified. If you want to sleep for less time than is the timer precision, too bad - no sleep.

But even if they are not ignored, there's still not a lot of pressure on the thread pool to give you more threads. You're only blocking for less than 100ms, for four tasks in a line - that's not nearly enough for the pool to start creating new threads to handle the requests (on MS.NET, new threads are only spooled after not having any free threads for 200ms, IIRC). You're simply not doing enough work for it to be worth it to spool up new threads!

The point you might be missing is that Task.Factory.StartNew is not actually starting any new threads, ever. Instead, it's scheduling the associated task on the default task scheduler - which just puts it in the thread pool queue, as tasks to execute "at earliest convenience", basically. If there's one free thread in the pool, the first tasks starts running there almost immediately. The second will run when there's another thread free etc. Only if the thread usage is "bad" (i.e. the threads are "blocked" - they're not doing any CPU work, but they're not free either) is the threadpool going to spawn new threads.

edited Jun 08 '15 at 12:21

answered Jun 08 '15 at 12:05

Luaan

62,244
7
97
116

the whole point of TPL is the 4 tasks should be kicked off and executing in parallel. the message of tasks being launched in the mixture of other message is what i am expecting. The whole point is not have everything executed in sequential order – casbby Jun 08 '15 at 12:20
1

@casbby Nope, that's not the point of TPL at all. The point of TPL is to make it easy to handle parallel and asynchronous tasks safely. You're still deferring to the underlying .NET framework and OS to decide what's the best way to do that. That doesn't have to include any parallelism at all. If you make your test something that's actually useful and that takes a bit more time, you'll quickly see that your expected level of parallelism is achieved with no issues. Creating threads is relatively expensive - both in time and memory. Unless you *need* them, you want to avoid creating them. – Luaan Jun 08 '15 at 12:22
it looks like you pointed out something about mono is different than windows. i am aware of tpl relying on the thread pool to schedule the task. which is why i am more surprised to see there was no thread allocated to the other tasks while the machine is practically idle. for Parallel tasks tpl is the way to go isnt it? i want the program (a batch program process large amount matrix calculation) to take up as much cpu as it can. i can live with the machine being little less responsible and starting the fans during that period of time. please advise? – casbby Jun 08 '15 at 12:42
1

@casbby Sure, stick with TPL. The thing that's broken is your benchmark, not TPL :) It will work just fine for any real workload - it will tend to spread your load evenly accross all the available cores (provided you can avoid sharing data between the tasks, of course). Unless TPL or the thread pool is broken in Mono, but if it were broken like this, it would be pretty much useless, so I wouldn't bet on that. – Luaan Jun 08 '15 at 12:46
@casbby As for the differences between Mono on Mac OS and .NET on Windows, it seems that it both ignores the `Sleep` (under some interval, most likely), and also quite possibly initializes the thread pool differently. But they shouldn't affect the behaviour of a typical parallelised program - just your benchmark. – Luaan Jun 08 '15 at 12:51

Fyodor Soikin · Answer 2 · 2015-06-08T12:41:54.060

If you look at the IL output from this program, you'll see that the inner loop is optimized away, because it doesn't have any side effects, and its return value is completely ignored.

To make it count, put something non-optimizable there, and also make it heavier: 1000 empty cycles is hardly noticeable compared to the cost of spinning up a new task.

For example:

let doWork (num:int) (taskId:int) : unit =
    for i in 1 .. num do
        Thread.Sleep(10)
        for j in 1 .. 1000 do
            Debug.WriteLine("x")
        Console.WriteLine(String.Format("Task {0} loop: {1}, thread id {2}", taskId, i, Thread.CurrentThread.ManagedThreadId))

Update:
Adding a pure function, such as your fact, is no good. The compiler is perfectly able to see that fact has no side effects and that you duly ignore its return value, and therefore, it is perfectly cool to optimize it away. You need to do something that the compiler doesn't know how to optimize, such as Debug.WriteLine above.

thanks for the suggestion and give me the idea of using ilspy. i redid the test on mono with the code suggested. I still can not trick mono to use multiple threads. The only way i did get multi thread is to force a 20ms or longer sleep on the existing thread. but on windows, multi threads are allocated to the tasks even with no real inner loop processing. — casbby, Jun 08 '15 at 21:45

F# task parallelism under Mono doesn't "appear" to execute in parallel

2 Answers2