16

I was reading a useful post at WRI blog on improving speed of code, and I need help in understanding this one.

Compare these speeds

Timing[
 tbl = Table[i + j, {i, 1, 1000}, {j, 1, 1000}];     
]

{0.031, Null}

and

Timing[
 a = 1000;
 tbl = Table[i + j, {i, 1, a}, {j, 1, a}];
 ]

{0.422, Null}

So it is much faster when putting the actual value for the limit inside the table itself vs outside. The explanation for this, which I am sure it is correct, but I need help in understanding, is that Table is compiled if its limit are numeric vs. not, this is because its Attributes is HoldAll.

But my question is: How would the above actually work, because the limits to Table must, at one point, become numeric anyway? I can't write

Clear[a]
tbl = Table[i + j, {i, 1, a}, {j, 1, a}]

The above gives an error.

So, for me, writing a=1000 outside Table vs. inside, should have made no difference, since without a having a numerical value, Table[] can't do anything. So the replacing of a by the number 1000 must occur at one point of time by evaluator before Table[] can do anything useful, would it not?

In other words, what Table should see, eventually, is {i, 1, 1000}, {j, 1, 1000} in both cases.

So, the way I thought this would happen is this:

  1. Evaluator replaces a by 1000 in the arguments of table
  2. Evaluator calls Table with the result, which is now all numeric.
  3. Table Compiles, and runs faster now.

But what seems to happen is something else. (due to HoldAll ?)

  1. Table takes its arguments, as is. Since it has HoldAll, so it sees a and not 1000.
  2. It does not call Compile since its arguments are not all numbers.
  3. It now generate a table with the a limit, Evaluator evaluates a to 1000
  4. Table is generated now all limits are numeric, but slower now since code is not compiled.

Question is: Does the above sort of what happens? Could someone explain the steps that would have happened to explain this difference in timing?

Also, how would one insure that Table is Compiled in both cases in the above example, even if one uses a variable for the limit? It is not always possible to hardcode the numbers for the table limits, but one must sometime use a variables for these. Should one explicitly use the Compile command? (I do not use Compile directly, since I assumed it is done automatically when needed).

edit(1)

In answer to post by Mike below on finding no difference in timing when using a call.

ClearAll[tblFunc];
Timing[a = 1000;
 tblFunc[a_] := Table[i + j, {i, 1, a}, {j, 1, a}];
 Developer`PackedArrayQ[tblFunc[a]]
 ]

gives

{0.031, True}

But that is because a is now the number 1000 INSIDE the function, once it is called. Since M passes things by VALUE.

If we force the call to be by reference, so that a is left unevaluated, then we get

ClearAll[tblFunc];
Timing[a = 1000;
 tblFunc[a_] := Table[i + j, {i, 1, a}, {j, 1, a}];
 Developer`PackedArrayQ[tblFunc[Unevaluated@a]]
 ]

now we see the expected result, since now a is still symbolic INSIDE the function, we are back to square one, and now it is slow, since not packed. And since it is not packed, Compile is not used.

{0.437, False}

edit(2) Thanks to everyone for the answers, I think I learned allot from them.

Here is an executive summary, just to make sure I got everything ok.

enter image description here

edit(3)

Here are links I have specially related to hints to use to making Mathematica code runs faster.

  1. http://library.wolfram.com/howtos/faster/
  2. http://blog.wolfram.com/2011/12/07/10-tips-for-writing-fast-mathematica-code/
  3. https://stackoverflow.com/questions/4721171/performance-tuning-in-mathematica
  4. Using Array and Table Functions in Mathematica. Which is best when
Community
  • 1
  • 1
Nasser
  • 12,849
  • 6
  • 52
  • 104
  • 4
    Note that you can have things like `Table[i + j, {i, 1, 1000}, {j, 1, i}]` which is a reason for `Table` not to be precompiled when the limits aren't all numeric. – David Z Jan 03 '12 at 08:06
  • 1
    Related question: http://stackoverflow.com/questions/5764774/using-array-and-table-functions-in-mathematica-which-is-best-when/ – Leonid Shifrin Jan 03 '12 at 09:49
  • 4
    @David I agree with your statement, just want to stress that the real reason why `Table` is slow for symbolic iterators seems to be that it can not determine whether or not the result will be a *rectangular* array, and therefore, can not use packed arrays. The fact that it can not then use `Compile` follows from this - because the packed arrays are what gives `Compile` its efficiency gains (at least when compiling to MVM target, which is what I believe is happening in auto-compilation). – Leonid Shifrin Jan 03 '12 at 09:55
  • Another interesting observation: `Timing[Table[i + j, {i, Range[5000]}, {j, Range[5000]}];]` appears to be uncompiled as well, probably for the same reasons. Here, however, the only practical option of inlining is using `With` as in `With[{rr = Range[5000]}, Table[i + j, {i, rr}, {j, rr}]];`. It's not very practical to enter 5000 values as an input cell ... – Szabolcs Jan 09 '12 at 08:54
  • A great Q&A. Saving me a lot of time wandering in the unknown. One observation on @Nasser Edit(1) "for higher dimensions table, if at least ONE index is symbolic, then not packed/SLOW", I found that **it is true only on inner index**. In the given example, it is the outer symbolic index and it still works fast for me; once it is changed to inner symbolic index that it will become slow. – Lawrence Teo Apr 17 '15 at 00:11

3 Answers3

17

So this is what I think is happening. The reason why you see the slow down between a numeric and a symbolic limit on Table is due to the fact that you do a double index. Each sub-table (e.g. going over all indices j for a fixed index i) is constructed separately and when the limit is symbolic there is an extra step involved in figuring out that limit before constructing each sub table. You can see this by examining, e.g.

Trace[a = 3;
      tbl = Table[i + j, {i, 1, a}, {j, 1, a}];
     ]

David gives a good example for why you would want to do this check for every sub list. As to why Mathematica cannot figure out when this check is not needed I have no clue. If you only have one index to sum over there is no difference in speed between the symbolic and numeric version

Timing[tbl = Table[i + j, {j, 1, 1000}];]
{0.0012, Null}

Timing[a = 1000;
       tbl = Table[i + j, {j, 1, a}];
      ]
{0.0013, Null}

To answer your follow up regarding speed; making tbl a function is faster for both numeric and symbolic limits.

Timing[a = 1000;
       tblFunc[a_] := Table[i + j, {i, 1, a}, {j, 1, a}];
       tblFunc[a];
      ]

{0.045171, Null}

vs.

Timing[tbl = Table[i + j, {i, 1, 1000}, {j, 1, 1000}];]
{0.066864, Null}

Timing[a = 1000;
       tbl = Table[i + j, {i, 1, a}, {j, 1, a}];
      ]
{0.632128, Null}

You gain even more speed if you intend to reuse the tbl construction.

b=1000;
Timing[tblFunc[b];]
{0.000013, Null}
Community
  • 1
  • 1
Timo
  • 4,246
  • 6
  • 29
  • 42
  • 3
    I think that tblFunc is fast because the SetDelayed defining the function acts like a With when tblFunc is evaluated, ie. a is replaced by its numerical value before the rhs expression is evaluated. Thus we can have the limit as a symbolic argument but still go back to the numerical argument case. – faysou Jan 03 '12 at 10:45
  • @Faysal, I think you are correct. Replacing the second iterator with `{j,1,i}` in `tblFunc` gives a timing of 0.34 which is again on par with the non-function symbolic version (albeit a bit faster). – Timo Jan 03 '12 at 13:25
3

The key things to monitor, as others have mentioned, are packing and list length. I actually don't see the differences that Timo reports:

ClearAll[tblFunc];
Timing[a = 1000;
 tblFunc[a_] := Table[i + j, {i, 1, a}, {j, 1, a}];
 Developer`PackedArrayQ[tblFunc[a]]]

{0.077706, True}

vs

ClearAll[tbl];
Timing[
 tbl = Table[i + j, {i, 1, 1000}, {j, 1, 1000}];
 Developer`PackedArrayQ[tbl]]

{0.076661, True}

ClearAll[tbl];
Timing[a = 1000;
 tbl = Table[i + j, {i, 1, a}, {j, 1, a}];
 Developer`PackedArrayQ[tbl]]

{1.02879, False}

So for me the only difference is if the list is packed. Whether it is a function makes no difference to timing on my set up. And as expected when you switch off autocompilation the timings are the same for all of the above because no packing occurs:

SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> Infinity}];

{1.05084, False}

vs

{1.00348, False}

{1.01537, False}

reset the table autocompile length:

SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 250}]
Mike Honeychurch
  • 1,683
  • 10
  • 17
  • But isn't the reason that you found no difference in timing is because when you make a call to a function, then inside the function, `a` now is a number? since M call is by VALUE? Please see edit(1) in my post for what I mean. THanks – Nasser Jan 04 '12 at 00:15
  • I was just reproducing Timo's code and noting that I do not see the timing differences that he sees. The reason would seem to be that both are packed -- so I agree with your point. – Mike Honeychurch Jan 04 '12 at 00:20
2

This is slightly OT, but for speed here you might want to avoid using the item-by-item processing that's implicit in using Table. Rather, use Outer. Here's what I'm seeing on my system:

   Timing[Outer[Plus, Range[5000], Range[5000]];]
{0.066763,Null}

   Timing[Table[i + j, {i, 1, 5000}, {j, 1, 5000}];]
{0.555197,Null}

Quite a dramatic difference.

murray
  • 737
  • 2
  • 10
  • 28
  • Interesting observation! I believe this is due to a special-casing of `Outer` for certain common arguments such as `Plus` or `Times`. My hunch is that for a general function, `Table` is more likely to be faster as it might compile. Another interesting data point: `Timing[Table[i + j, {i, Range[5000]}, {j, Range[5000]}];]` --> 16 seconds in my machine (i.e. uncompiled) – Szabolcs Jan 09 '12 at 08:49
  • The timings I got are are for Mathematica 8.0.4 under OS X 10.6.8 on an iMac with a 3.4 GHz Core i7, 16 GB RAM. And a fresh kernel for each evaluation. – murray Jan 09 '12 at 17:01
  • My point was just that the more precise analogue of `Outer` using `Table` can apparently be quite slow. `Outer[Plus,Range[5000],Range[5000]` is very fast (for `Plus` and similar functions only), `Table[i+j, {i,5000}, {j,5000}]` and `With[{r=Range[5000]}, Table[i+j, {i,r}, {j,r}]]` are fast, while `Table[i+j, {i,Range[5000]}, {j,Range[5000]}]` is quite slow. According to the other answers to this question, this is because (the lack of) auto-compilation. My guess about the reason why `Outer` is so fast is that it is special-cased (optimized) for `Plus`. – Szabolcs Jan 09 '12 at 17:14
  • Yes, the advantage of Outer is clearly different for other functions. For example, with f[i_,j_] := Sin[i^2. j], the Timing result with Outer is roughly 34.5 whereas the Timing result with Table is just over 42. – murray Jan 09 '12 at 22:16