Does MATLAB's implicit broadcasting optimise based on surrounding code?

Question

A question asked today gave a surprising result with respect to implicit array creation:

array1 = 5*rand(496736,1);
array2 = 25*rand(9286,1);
output = zeros(numel(array1), numel(array2)); % Requires 34GB RAM
output = zeros(numel(array1), numel(array2),'logical'); % Requires 4.3GB RAM
output = abs(bsxfun(@minus, array1.', array2)) <= 2; % Requires 34GB RAM
output = pdist2(array1(:), array2(:)) <= 2; % Requires 34GB RAM

So far so good. An array containing 496736*9286 double values should be 34GB, and a logical array containing the same amount of elements requires just 4.3GB (8 times smaller). This happens for the latter two because they use an intermediate matrix containing all the distance pairs in double precision, requiring the full 34GB, whereas a logical matrix is directly pre-allocated as just logicals, and needs 4.3GB.

The surprising part comes here:

output = abs(array1.' - array2); % Requires 34GB RAM
output = abs(array1.' - array2) <= 2; % Requires 4.3GB RAM ?!?

What?!? Why doesn't implicit expansion require the same 34GB RAM due to creation of the intermediate double matrix output = abs(array1.' - array2)?

This is especially strange since implicit expansion is, as I understood it, a short way of writing the old bsxfun solutions. Thus why does bsxfun create the full, 34GB, matrix, whereas implicit expansion does not?

Does MATLAB somehow recognise the output of the operation should be a logical matrix?

All tests performed on MATLAB R2018b, Ubuntu 18.04, 16GB RAM (i.e. 34GB arrays error out)

Some glorious hint from the [`bsxfun` doc](https://www.mathworks.com/help/matlab/ref/bsxfun.html#mw_6a9dc665-1dbf-4408-a8c4-c327974c7074): _Compared to using `bsxfun`, implicit expansion offers faster speed of execution, better memory usage, and improved readability of code._ Guess, MathWorks put the magic right there, and of course, there's no documentation on that. Maybe predicting the proper output type from parsing arguments from outer to inner is a thing now!? Hopefully, some non-MathWorks people can provide some more information. — HansHirse, Sep 16 '19 at 11:32
@HansHirse That's a very interesting hint indeed. However, as for _speed of execution_ I don't see a big difference. I [tested that in the past](https://stackoverflow.com/q/42559922/2586922), and the difference was small, and only noticeable for small arrays. This is in agreement with [what Steve Eddins said](https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/#comment-46516). I've just repeated the tests in R2019a, and the results are the same — Luis Mendo, Sep 16 '19 at 13:02
Is it possible that the JIT computes compound statements element-wise? (I mean, `d=a+b+c` not being computed as `tmp=a+b; d=tmp+c`, but rather as `d(i)=a(i)+b(i)+c(i)`? Anyone with the ability to test this (max memory usage for example)? — Cris Luengo, Sep 16 '19 at 13:29
trimming array1 to fourth of its length allows executing the code on my machine, and it does NOT use the same RAM, lots of RAM for lots of output, much less RAM for logical output. I'd guess element-wise theory is true — Yuval Harpaz, Sep 16 '19 at 13:46
Some hints from [julia](https://julialang.org/blog/2017/01/moredots).Maybe MATLAB has chosen other methods like splitting data and computing chunk by chunk. — rahnema1, Sep 16 '19 at 14:44
I guess that @CrisLuengo is right, matlab is probably doing something close to`output = bsxfun(@(x,y)(abs(x-y)<2),array1.',array2)`. If you put the comparison operator inside the anonymous function, bsxfun should consume less memory. — obchardon, Sep 16 '19 at 15:43
@CrisLuengo that is quite likely as those type of compound instructions are often available as ALU primitives in modern CPUs. — Ander Biguri, Sep 16 '19 at 15:59
@AnderBiguri True. Plus, it would be a big win for an interpreter/JIT as it reduces the amount of intermediate memory used and improves the cache usage. — Cris Luengo, Sep 16 '19 at 16:01
What's the next stage in this optimization? `x = ones(1e200, 1e300); clear x` running in no time? — Luis Mendo, Sep 16 '19 at 17:52
@LuisMendo: that's what a good optimizing compiler should do. There's no point in executing a statement without effect. — Cris Luengo, Sep 16 '19 at 18:13
@cris I always thought an interpreter executed statements independently of each other. I find its lack of discreetness disturbing — Luis Mendo, Sep 16 '19 at 18:24
Would be interesting to see what MATLAB actually executes. I did a lot of work with C code generation, there it is obviously much easier. Just compare your initial C code with the optimized one, and if in any doubts run both versions in your test suite. — Daniel, Feb 18 '20 at 22:17

Does MATLAB's implicit broadcasting optimise based on surrounding code?

0 Answers0

Linked