15

There are always several ways to do the same thing in Mathematica. For example, when adapting WReach's solution for my recent problem I used Condition:

ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] /; (Unset[done]; True) := 
 Internal`WithLocalSettings[Null, done = f[expr], 
  AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]; Unset[done]]]

However, we can do the same thing with Block:

ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] := 
 Block[{done}, 
  Internal`WithLocalSettings[Null, done = f[expr], 
   AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]]]]

Or with Module:

ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] := 
 Module[{done}, 
  Internal`WithLocalSettings[Null, done = f[expr], 
   AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]]]]

Probably there are several other ways to do the same. Which way is the most efficient from the point of view of memory and CPU use (f may return very large arrays of data - but may return very small)?

Community
  • 1
  • 1
Alexey Popkov
  • 9,355
  • 4
  • 42
  • 93

1 Answers1

19

Both Module and Block are quite efficient, so the overhead induced by them is only noticable when the body of a function whose variables you localize does very little. There are two major reasons for the overhead: scoping construct overhead (scoping constructs must analyze the code they enclose to resolve possible name conflicts and bind variables - this takes place for both Module and Block), and the overhead of creation and destruction of new symbols in a symbol table (only for Module). For this reason, Block is somewhat faster. To see how much faster, you can do a simple experiment:

In[14]:= 
Clear[f,fm,fb,fmp]; 
f[x_]:=x;
fm[x_]:=Module[{xl = x},xl];
fb[x_]:=Block[{xl = x},xl];
Module[{xl},fmp[x_]:= xl=x]

We defined here 4 functions, with the simplest body possible - just return the argument, possibly assigned to a local variable. We can expect the effect to be most pronounced here, since the body does very little.

In[19]:= f/@Range[100000];//Timing
Out[19]= {0.063,Null}

In[20]:= fm/@Range[100000];//Timing
Out[20]= {0.343,Null}

In[21]:= fb/@Range[100000];//Timing
Out[21]= {0.172,Null}

In[22]:= fmp/@Range[100000];//Timing
Out[22]= {0.109,Null} 

From these timings, we see that Block is about twice faster than Module, but that the version that uses persistent variable created by Module in the last function only once, is about twice more efficient than Block, and almost as fast as a simple function invokation (because persistent variable is only created once, and there is no scoping overhead when applying the function).

For real functions, and most of the time, the overhead of either Module or Block should not matter, so I'd use whatever is safer (usually, Module). If it does matter, one option is to use persistent local variables created by Module only once. If even this overhead is significant, I'd reconsider the design - since then obviously your function does too little.There are cases when Block is more beneficial, for example when you want to be sure that all the memory used by local variables will be automatically released (this is particularly relevant for local variables with DownValues, since they are not always garbage - collected when created by Module). Another reason to use Block is when you expect a possibility of interrupts such as exceptions or aborts, and want the local variables to automatically be reset (which Block does). By using Block, however, you risk name collisions, since it binds variables dynamically rather than lexically.

So, to summarize: in most cases, my suggestion is this: if you feel that your function has serious memory or run-time inefficiency, look elsewhere - it is very rare for scoping constructs to be the major bottleneck. Exceptions would include not garbage-collected Module variables with accumulated data, very light-weight functions used very frequently, and functions which operate on very efficient low-level structures such as packed arrays and sparse arrays, where symbolic scoping overhead may be comparable to the time it takes a function to process its data, since the body is very efficient and uses fast functions that by-pass the main evaluator.

EDIT

By combining Block and Module in the fashion suggested here:

Module[{xl}, fmbp[x_] := Block[{xl = x}, xl]]

you can have the best of both worlds: a function as fast as Block - scoped one and as safe as the one that uses Module.

Community
  • 1
  • 1
Leonid Shifrin
  • 22,449
  • 4
  • 68
  • 100
  • Very clear timing comparison, thank you! Just a note: it would also be very interesting to compare memory-efficiency of these functions from the point of view of memory leaks and memory fragmentation induced by `Block` and `Module` as compared to cycles `Set`-`Unset`, `Set`-`Clear`, `Set`-`Remove`. At this moment I have no idea how it could be made (probably some statistics should be collected for many evaluations of some specially designed code), but I think that such comparison would be extremely interesting and valuable. – Alexey Popkov Sep 29 '11 at 13:33
  • @Alexey I agree that the data on memory consumption / fragmentation would be quite useful. But, as you noted, this is a hard task, because the details of memory allocation are not exposed to the user. Also, since memory management can be considered an implementation detail (from the user's viewpoint), we can not be sure that whatever data we collect will remain true for future versions, given that implementations do change. This situation will only change if the future versions of Mathematica would provide a finer user-level control over memory management (which I doubt). – Leonid Shifrin Sep 29 '11 at 17:48
  • 1
    @Alexey That said, my naive expectation would be that, in terms of memory consumption / fragmentation, `Module` would be comparable to `Set - Remove`, while `Block` - to `Set-Clear`. I don't think that there would be significant differences, since I view both `Module` and `Block` as higher-level constructs, which in some sense live "on top" of a system that has `Set`, `Unset`, `Remove`, and `Clear`. – Leonid Shifrin Sep 29 '11 at 17:53
  • 2
    Hi Leonid, Nice answer. There could be one correction to the part near the beginning: Unlike Module, Block does _not_ do any lexical analysis of its argument (since there is no symbol renaming necessary). This can be an important difference in some situations. To see the difference, construct an expression where the lexical analysis is expensive but the evaluation is cheap: scoped=With[{large=ConstantArray[x,10^7]},Hold[{x=5},If[True,x,large]]]; Now evaluate and cache everything: Timing[scoped;]. Then compare Timing[Module @@ scoped] with Timing[Block @@ scoped]. – Andrew Moylan Oct 02 '11 at 05:01
  • @Andrew Thanks Andrew, good point! Block does not do renaming indeed, but it still does perform the variable bindings, even though the binding is dynamic. This is what I meant by scoping overhead. For example, if you use `Timing[ x = 5; result = scoped[[2]]; Unset[x]; result]`, you will see that this is near-instantaneous, while `Block` takes a sizable fraction of a second for your example on my machine. So, I still stand by the point that there *is* a scoping overhead associated with `Block`, just not as large as for `Module`. Please do get me corrected if I get this wrong though. – Leonid Shifrin Oct 02 '11 at 12:40
  • @Andrew In other words, my current understanding is that `Block` *does some* lexical analysis, although less of it than `Module`. Otherwise, I find it hard to explain the time difference between using `Block` and imitating `Block` with top-level assignments, as described in my previous comment. What it does not do is that it does not create new symbols, and it does not literally replace the entries in the original code with generated symbols - which, I think, mostly accounts for the time difference between `Module` and `Block`. – Leonid Shifrin Oct 02 '11 at 12:45
  • @Andrew A little clarification to the above: the effect I mentioned is only observable when you *do not* cache (do not perform `Timing[scoped;]` first). When you do, `Block` *is* instantaneous, but that's because caching effectively did the binding for it, I guess. As soon as you reassign the result, like so say: `Timing[Block @@ (scoped1 = scoped);]`, you will see the overhead again, so I'd consider caching a rather special case, not a general use case for `Block`. – Leonid Shifrin Oct 02 '11 at 13:02