I'm coming to julia from years of R and Matlab, where vectorised code was key for performance and has affected my thinking. Now, reading this wonderful blog post 'more dots' by S. Johnson I'm convinced that julia's syntactic loop fusion is a great feature of the language, but it leaves me a bit confused when it comes to porting my previously vectorised code.
Is it bad / poor form to have some functions vectorised over a few parameters? – Should I instead write "scalar" versions of all the routines and call the dot version (or broadcast / map) when I want to iterate over an array?
For the sake of argument, say I write the following
function sphere_vol(r)
return 4/3*pi*r^3
end
and then use it in another function,
function density(N, r)
V = sphere_vol(r)
return N/V
end
and so on (many functions calling each other). My instinct (coming from other languages) is to define most functions as vectorised whenever convenient, as in sphere_vol(r) = 4/3*pi*r.^3
(trivial change). Is there any point to do so, if I can automatically call sphere_vol.(r::Array)
(or map(sphere_vol, r::Array)
? For a single-argument function this is not an important decision to make, but for multiple arguments it can change dramatically the implementation: I could define a vectorised version of density(N, r)
vectorised over r
, or over N
, or both (returning a matrix), but it's much less trivial to write this way (needs to call broadcast internally over sphere_vol.()
and ./
). In R and Matlab I made this choice on a case-by-case basis with this compromise in mind:
vectorisation is (was) more convenient and more efficient: I can call
density(N::Vector, r::Vector)
once and get a full array of values.writing a vectorised function over several parameters quickly becomes cumbersome and/or unmanageable (2 parameters is usually OK with some tricks); especially when the return values are not scalars.
I do not know how to make the judgment call when implementing a new function in julia.
Would it be correct to say that in julia I'd be better off writing only "scalar" versions as above? Or is it the case that some routines will be more efficient if vectorised over some parameters (e.g special functions such as Bessel, which call Fortran routines)? I'm guessing there's a subtle balance to find (according to taste/style, also), but with much less impact on performance than with R or Matlab.