How can I utilize multithread CPU most in Matlab?

Question

I just bought the Matlab Parallel Computing toolbox.

The command matlabpool open opens parallel workers with the number of the cores in my CPU.

But each of my CPU core has two threads. According to Windows Task Manager, each worker can only use half performance of one CPU core, which seems could be interpreted as one worker = one thread = "half core".

Therefore, after all workers opened, still half of the total power of CPU could be utilized.

Is there any other command could help with that?

I'm pretty sure your CPU has more than 4 threads. Even a microcontroller (with a basic RTOS) can handle a dozen of threads. However, if your computation is CPU-intensive, spawning a lot of threads will not improve the speed of the processing. — lucasg, Sep 19 '13 at 07:53
Here it is mentioned that [you can specify the amount of workers](http://www.mathworks.nl/help/distcomp/matlabpool.html), though I am not sure whether you can exceed your current amount. If all else fails you can always consider to [use multiple matlab sessions](http://stackoverflow.com/questions/18204663/run-a-script-that-uses-multiple-matlab-sessions). — Dennis Jaheruddin, Sep 19 '13 at 07:58
To clarify, I suspect what you are talking about is taking full advantage of hyper-threaded processing units. This question has come up on SO before, and I've provided an answer [at this link](http://stackoverflow.com/questions/14468886/matlabpool-number-of-threads-vs-core), so I'm marking this question as a duplicate and voting to close. Please let me know (in this comment thread) if you are actually asking something different. — Colin T Bowers, Sep 19 '13 at 08:36
to be exact, `matlabpool` launches background *processes* not threads (they communicate between each using MPI). MATLAB computation engine (the kernel if you will) is really single threaded at its core, although the IDE and various other things run in separate threads (the Java frontend). Yet many builtin math functions have multithreaded implementations, but those are really parallelized outside of MATLAB thanks to libraries such as Intel MKL, FFTW, and the like.. — Amro, Sep 19 '13 at 10:19

score 3 · Answer 1 · answered Sep 19 '13 at 10:50

By default, the local cluster type for matlabpool considers only "real" cores when choosing the default number of workers to launch. This is because for MATLAB workloads, hyperthreading often does not provide much benefit. However, this value is only a default - you can edit the cluster type and run anything up to 12 local workers.

score 2 · Answer 2 · answered Jun 05 '14 at 11:05

You need to understand HyperThreading to answer this question.

Matlab launches a worker thread for every CPU. Suppose you now use a directive like parfor to distribute computation over multiple threads. Every thread will now be crunching numbers happily.

Suppose you are doing a sum of a large vector of numbers. What actually happens is the following:

sum = sum + a[0]
- array a is not in my CPU cache yet
- I will fetch a small part of a from main memory and put it in the CPU cache
sum = sum + a[1]
sum = sum + a[2]
...

During the fetch of a, the CPU stalls, waiting for the system memory. This is called a pipeline bubble, and it is not good for performance. Sometimes, a part of the array a was swapped out to the hard drive. The operating system will need to access the drive to put that part into main memory, after which it will be transferred to the CPU cache. When this happens, your operating system will not let the CPU wait for +200 ms. It will use that time to execute another task instead (like the backup running on your system, or refreshing your screen, or ...).

Switching tasks on a CPU results in a performance penalty. To switch to a different task, the operating system must save the CPU registers in main memory, and load the CPU registers of the other task back into the CPU first. This takes time.

With HyperThreading, the number of registers per CPU is doubled. This means that two processes can 'occupy' the CPU. Only one can be executed, but during a stall, the operating system can switch to the second process without any performance penalty.

Forget how Microsoft Windows reports CPU usage. It's wrong. CPU usage is a lot more complicated than only a simple 47%. The real question is rather: should matlab register two threads per core, or only one?

Arguments pro:

During a stall, the CPU can quickly switch to the other thread and continue executing.

Arguments contra:

There are more threads, and the problem is divided in smaller pieces. This may actually reduce performance, as you need to put more pieces together to get the final result.
A context switch will still 'poison' the L1 and L2 cache, loading in pieces of memory that are of no use to the other thread on the CPU.
If there are no stalls, you have more overhead.
On a desktop, the operating system will also want to run: redrawing the screen, moving your mouse, etc. When all logical cpu's are in use, the operating system is required to do an actual (costly) context switch.
Your problem will only be complete if all pieces of the problem have been calculated. Using all the cores / threads increases the risk of one thread taking more time.

My guess is that the Matlab developers considered the arguments contra to be more important than the arguments pro. My own performance tests certainly suggest that there is little performance gain from HyperThreading for cpu-intensive calculations.

How can I utilize multithread CPU most in Matlab?

2 Answers2

Linked