TL;DR: open up your pool outside the timer and choose a more difficult operation.
There's two main issues. One is what Ander mentioned in his comment, starting up the parallel pool takes a second or two. You can open up it beforehand (in MATLAB you can do this via parpool
) to speed that up. Alternatively, run a single parallel operation, thus opening the pool, and then redo the timing.
The second issue is the simplicity of your operation. Just squaring a number cannot go much faster than it already goes in serial. There's no point in passing data back-and-forth between workers for such a simple operation. Redo your test with a more expensive function, e.g. eig()
as MATLAB does in their examples.
Parallellisation is thus useful if the runtime of your operations greatly outweighs the overhead of passing data to and from workers. Basically this means you either have a very large data set, which you need to perform the same operation on each item (e.g. taking the mean of every 1000 rows or so), or you have a few heavy, but independent, tasks to perform.
For a more in-depth explanation I can recommend this answer of mine and references therein.
Just as a sidenote, I'm surprised your serial for
is that fast, given that you do not initialise your output vector. Preallocation is very important, as "growing" arrays in loops requires the creation of a new array and copying all previous content to it every iteration.
You also might want to consider not using i
or j
as variable names, as they denote the imaginary unit. It won't affect runtime much, but can result in very hard to debug errors. Simply use idx
, ii
, or a more descriptive variable name.