0

I have used a parfor loop, but the CPU usage is around 50%. The configuration of the computer is shown in the picture. Are there only 4 cores that I can use? Is there a command to open all the cores? Does it matter how I write parfor?

The simplified codes are as follows:

n = 5;
d = 2^n;
r0 = [2 3];
m = d^2;
delta0 = 0:0.05:0.5;

ave = 50;
tic;
for j = 1:length(r0)
  for k1 = 1:length(delta0)
    delta = delta0(k1);
    r = r0(j);

    parfor i = 1:ave

    % Getdataz and Solve_CC_rhoE_zGau are function file
    [Pauli,y,rhoT,noiseT] = Getdataz(d,n,r,m,s,delta);
    [rhoE,noiseE] = Solve_CC_rhoE_zGau(n,r,m,s,Pauli,y,noiseT,delta);

    rhoE = rhoE/trace(rhoE);

   FdRho(i) = fullfidelity(rhoT,rhoE);
   end
   out.delta(k1,1) = delta;
   out.FdRho(k1,:) = FdRho;

   end  
 end
toc;

enter image description here

Adriaan
  • 17,741
  • 7
  • 42
  • 75
Mengr
  • 129
  • 5
  • Please note that `parfor` is not a magic wand. Naively changing a `for` into `parfor` might not increase execution speed and can even lead to syntax errors, see the links in my answer. Without you showing us the inner functions, we can't help you much, beyond guessing that your code is actually faster when running sequential. – Adriaan Jun 07 '22 at 08:14

2 Answers2

3

Multi threading is a complicated subject, especially in MATLAB where you have little control over how it is done.

Performance

First off, parallelization is not the only way to increase performance and should not be your only got-to method. Here are what MATLAB suggests. And @Adriaan suggests a few improvements that would probably improve performances more than using additional CPU resources.

Why not 100 %

The reason you're not getting 100% CPU usage is that MATLAB uses as much workers as you have physical cores. Your CPU has 4 physical cores, 8 logical ones, that's why it has about 50% usage.

Full CPU usage does not mean shorter execution time

Getting 100 % does not guarantee that your code will execute faster. There are multiple reasons why it might or might not work for you. If you are interested look at the comments under this pose and at this post from MATLAB answers. The ultimate answer it that you have to try and time you execution to see if using more resource actually improve your execution time.

Getting to 100 %

There are two ways you can force MATLAB to use 100% of your CPU.

  1. You can increase the number of workers that MATLAB uses to match the number of logical cores you have
  2. You can increase the number of threads each worker uses

To do that:

  • In the home toolbar
  • Under ENVIRONMENT
  • Go to the drop-down menu Parallel
  • Select Create and Manage Clusters...
  • On the left list, select the cluster you want to use (in general local (default))
  • Click edit on the bottom right
  • Increase NumWorkers (for option 1) or increase NumThreads (for option 2).
  • If you increased NumWorkers you might need to also increase the preferred number of workers in parallel preferences to actually have all of them starting in your parallel pool.

Final notes

  • Be careful when increasing NumWorkers. In my personal experience it can crash MATLAB on Ubuntu 22.04
  • Do not expect too much improvement. In general MATLAB is all about doing floating point operations. However each physical core usually only has one floating point unit (FPU). Hence you might not get as much improvement as you'd hope
  • The MATLAB preset for the number of workers is usually a good rule of thumb. I would recommend working on other aspect of optimization and only meddling with that in last resort.
LNiederha
  • 911
  • 4
  • 18
  • In my experience if increasing NumWorkers doesn't work really well (which I assumed was due to the still fresh Ubuntu 22, but probably was wrong about it), increasing NumThreads, gives me better performances and does use more of the CPU usage. Also if you have a source for that info, I'm interested in reading about it. – LNiederha Jun 07 '22 at 07:25
  • See e.g. [this comment](https://stackoverflow.com/questions/31056513/how-to-enable-multithreading-in-matlab#comment50156300_31056513) by the author of the parallel toolbox. There are better references by him, I'm trying to dig those up in the meantime. – Adriaan Jun 07 '22 at 07:33
  • 1
    [Here](https://ch.mathworks.com/matlabcentral/answers/80129-definitive-answer-for-hyperthreading-and-the-parallel-computing-toolbox-pct#answer_89845) is a pretty interesting discussion on matlab answers concerning multi threading. I would say, by reading it carefully, that it doesn't rule against using logical cores, although I am willing to admit that it shows that it is very much dependent on the actual use-case. I take it physical core is a good rule of thumb but you MIGHT get better performances. That however would takes a bit of experimenting. – LNiederha Jun 07 '22 at 07:41
  • 1
    Thanks, that was what I was looking for! Edric's the author of the parallel computing toolbox. So basically it depends on the algorithm the OP is using (which they haven't shown, as it's buried in those two user-defined functions). `Getdataz` sounds like it might do file-I/O, making hyperthreading a possible use-case, `Solve_(..)` OTOH, sounds strictly numeric. I guess it depends on which of the two takes the heavier load, to make hyperthreading better or not. (There're quite a few other problems with their code that need fixing first though, see my answer) – Adriaan Jun 07 '22 at 08:02
  • 1
    Indeed there are other issues. And my answer does oversimplify multi threading in MATLAB. Let me edit it so it makes things more accurate. – LNiederha Jun 07 '22 at 08:17
2

One of the bigger problems in your code, is that you've got three nested for or parfor loops and parallelised the innermost one. However, it's recommended to parallelise the outermost one whenever possible. Given you haven't provided us with either Getdataz or Solve_CC_rhoE_zGau, I'm going to assume those are rather light functions, making MATLAB spend more time on shifting data back and forth between the workers, rather than the actual computation.

You only have 1000 iterations, which isn't a lot for parfor. Either spmd() or parfeval might be better suited to your case. If you want to keep using parfor, rearrange your loops as such:

parfor k1 = 1:length(delta0)
    for jj = 1:length(r0)
        for ii = 1:ave
            (...)
        end
    end
end

Since r0 only contains two values, don't use that as outer loop, as you'd only have two parallel running threads that way. Also, given i and j are built-in variables, I usually caution against their use as loop variables.

You might want to read Decide when to use parfor and Convert for-loops into parfor-loops. Usually, parfor is recommended when you have a lot of iterations (magnitudes above your 1000), or each separate iteration is very heavy (in which case spmd or parfeval are recommended). See this answer of mine for a short summary.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • I would say that 1000 iterations _might_ be just fine for `parfor` - the tradeoff is data-shuffling vs. execution time of the loop body. The default partitioning for `parfor` sends the same number of sub-ranges to each worker regardless of the total number of loop iterations. Another heuristic might be to consider the serial `for` time vs. number of workers. If there's less than 1 second of work per worker, then you're going to get diminishing returns. But it still depends on the amount of data transfer. – Edric Jun 07 '22 at 13:07
  • 1
    Agree that it's nearly always better to have `parfor` outer-most (providing you have enough parallelism - if `length(delta0)` was 3, then that would not be great). – Edric Jun 07 '22 at 13:08
  • @Edric so basically if the inner two user-defined functions are heavy and non-multithreaded (they only require 5 or 6 scalars as input if I read that code correctly, so data shuffling shouldn't be too much of a problem), `parfor` might be faster? Isn't either `spmd` or `parfeval` recommended in that case, given the low number of iterations? – Adriaan Jun 07 '22 at 13:11
  • The `parfor` scheduling generally _tends_ to be faster than `parfeval` (because it has more information about the whole problem). With `spmd`, you're probably going to be looking at a deterministic work distribution - which might be perfectly fine if the execution time is very uniform. In fact I'm struggling to imagine a situation where a given `parfor` loop could run faster using `spmd` or `parfeval` (without significant restructuring). Maybe it could happen, but I doubt it's common. – Edric Jun 09 '22 at 08:27