Mathematica "linked lists" and performance

Question

In Mathematica, I create singly linked lists like so:

toLinkedList[x_List] := Fold[pair[#2, #1] &, pair[], Reverse[x]];

fromLinkedList[ll_pair] := List @@ Flatten[ll];

emptyQ[pair[]] := True;
emptyQ[_pair] := False;

Using the symbol pair for the cons cells has the advantage of Flatten working safely even if the lists contain Mathematica-style Lists, and allows you to define custom notation using MakeExpression/MakeBoxes, which makes everything much more pleasant. In order to avoid having to muck around with $IterationLimit, I wrote functions to work with these lists using either While loops or NestWhile instead of using recursion. Naturally, I wanted to see which approach would be faster, so I wrote two candidates so I could watch 'em fight:

nestLength[ll_pair] := 
 With[{step = {#[[1, -1]], #[[-1]] + 1} &},
  Last@NestWhile[step, {ll, 0}, ! emptyQ@First@# &]];

whileLength[ll_pair] := 
 Module[{result = 0, current = ll},
  While[! emptyQ@current,
   current = current[[2]];
   ++result];
  result];

The results were very strange. I tested the functions on linked lists of length 10000, and whileLength was usually about 50% faster, at about 0.035 seconds to nestLength's 0.055 seconds. However, occasionally whileLength would take about ~4 seconds. I thought there might be some caching behavior, so I started generating fresh, random lists to check, and whileLength wouldn't necessarily be slow on the first run with a new list; it might take dozens of times to see the slowdown, but then it wouldn't recur (at least not for the 200 runs I was trying with each list).

What might be going on?

For reference, the function I used for testing is this:

getTimes[f_, n_] :=
 With[{ll = toLinkedList@RandomInteger[100, 10000]},
  Table[Timing[f@ll], {n}][[All, 1]]]

EDIT: I neglected to mention the version earlier; I got these results with Mathematica 8.

EDIT the second: When I read Daniel Lichtblau's answer, I realized that my times for "typical" runs omitted a leading 0. It's been fixed.

EDIT the third: I think Leonid Shifrin is correct to associate the issue with Module; I can get the same sort of behavior from the NestWhile-based version by replacing the With with a Module:

nestModuleLength[ll_pair] := 
  Module[{step = {#[[1, -1]], #[[-1]] + 1} &}, 
   Last@NestWhile[step, {ll, 0}, ! emptyQ@First@# &]];

In[15]:= Select[getTimes[nestModuleLength, 100], # > 3 &]
Out[15]= {3.797}

@Yaroslav Bulatov: I can't see why packed arrays would be relevant, since nothing should be packed except the initial `List` generated by `RandomInteger`, which is immediately converted into a tree-like expression. — Pillsy, Feb 23 '11 at 20:52
Are you using version 7 or 8? Regardless, for what it's worth, I think you've uncovered something of a bug or at least a soft spot in the evaluation behavior. — Daniel Lichtblau, Feb 24 '11 at 02:28
I can't take the credit for `Module`- @belisarius was the first to suggest that it was the culprit. — Leonid Shifrin, Feb 25 '11 at 16:26
It is not With or Module per se (and I believe I even reproduced the problem with Block). It is that there are collisions between symbols that fool the evaluator into thinking it needs to check whether expressions require reevaluation. — Daniel Lichtblau, Feb 25 '11 at 16:31
FWIW, a naive translation of this code to F# is shorter and between 300 and 30,000x faster than Mathematica: http://stackoverflow.com/a/8286376/13924 — J D, Nov 28 '11 at 20:48

score 9 · Accepted Answer · edited Feb 25 '11 at 14:38

The examples below give typical results.

One slow example in a length 20 run.

In[18]:= getTimes[whileLength, 20]

Out[18]= {0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, \
0.031, 0.047, 0.032, 0.031, 0.031, 3.547, 0.047, 0.031, 0.031, 0.032, \
0.031, 0.031}

I note in passing that the timings are ~10x faster than in the original post, except for the slow cases which are comparable. Not sure what accounts for that difference in ratios.

No slow examples.

In[17]:= getTimes[nestLength, 20]

Out[17]= {0.047, 0.047, 0.062, 0.047, 0.047, 0.062, 0.047, 0.047, \
0.047, 0.063, 0.046, 0.047, 0.047, 0.063, 0.047, 0.046, 0.047, 0.063, \
0.047, 0.047}

One slow example in a length 100 run.

In[19]:= getTimes[whileLength, 100]

Out[19]= {0.031, 0.031, 0.031, 0.032, 0.031, 3.594, 0.047, 0.031, \
0.031, 0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.047, 0.031, \
0.031, 0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.047, 0.031, 0.031, \
0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, \
0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, \
0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, 0.031, \
0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.047, 0.031, 0.031, 0.032, \
0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.046, 0.032, \
0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, 0.032, 0.031, \
0.031, 0.031, 0.032, 0.031, 0.047, 0.031, 0.031, 0.031, 0.032, 0.031, \
0.031, 0.031}

Mathematica implements, imperfectly, what is called "infinite evaluation". That is to say, an expression reevaluates until it stops changing. To make this reasonably fast there are various optimizations that attempt to short circuit the process whenever possible.

In some cases this can be tricky to discern (due to an effect similar to hash collisions), and expressions might be needlessly reevaluated. Deeply nested expressions tend to be a worst case for this. We have further code that will often address these even in cases of collisions.

The culprit in this instance is exactly this code that attempts to determine quickly whether an expression requires reevaluation. It is peculiar but possibly a clue (to someone) that this happens at most once in a run inside that While loop. So something happens in the bad cases that prevents recurrence whilst inside the same While.

At one time I was familiar with the reevaluation detection code, having written a chunk of it. But it was rewritten for version 8. So even after seeing this suboptimal behavior in a debugger, it is is something of a mystery to me. About all I can say right now is that I filed a bug report.

As Leonid Shifrin observed, symbols with Attribute of HoldAllComplete are immune to this problem. So using that attribute can be beneficial for this type of code.

Daniel Lichtblau Wolfram Research

Just FYI. I was trying to repro the behavior running under the Workbench to profile it. But it just run flawless! — Dr. belisarius, Feb 25 '11 at 16:49
@belisarius Was this version 8.0? That could in theory make a difference. Beyond that, I am flummoxed. (Not sure what that means exactly, but I suspect it has to do with being beaten over the head by a flumm ox. At leas that's how it feels, some days.) Also, if you changed names of things e.g. pair-->flair, that could cause a collision to disappear. — Daniel Lichtblau, Feb 25 '11 at 17:33
@belisarius This might elicit the bad behavior. In getTimes change last line to Table[Update[]; Timing[f@ll], {n}][[All, 1]] — Daniel Lichtblau, Feb 25 '11 at 17:35
Yes, version 8. I'll try the Update[] thing and come back with results. — Dr. belisarius, Feb 25 '11 at 17:38

score 7 · Answer 2 · answered Feb 23 '11 at 22:40

7

A disclaimer: the following is a speculation. This seems to be related to the search for UpValues. It looks like this has been optimized for global variables (so that the system skips this step when it can determine that it can do that), but not for Module - generated local variables. To test this, assign HoldAllComplete attribute to pair, and the effect disappears (since then, UpValues are not checked for current):

SetAttributes[pair, HoldAllComplete];

In[17]:= ll = toLinkedList@RandomInteger[100, 10000];
Max[Table[Timing[whileLength[ll]], {1000}][[All, 1]]]

Out[18]= 0.047

HTH

answered Feb 23 '11 at 22:40

Leonid Shifrin

22,449
4
68
100

"since then, UpValues are not checked for current"; can you elaborate on that? – acl Feb 24 '11 at 00:27
2

@acl: when evaluator meets expression `f[a]` and `f` has a `HoldAllComplete` attribute, not only is `a` not evaluated before `f` (what happens to `a` then entirely depends of the rules for `f`), but also `UpValues` associated with `a` are not checked, while they are checked when `f` has `HoldAll`. I am not completely sure that this is what plays a role here, but the r.h.s of an assignment looks like `Part[pair[num,pair[..]],2]`. When `Part` is evaluated, `pair[...]` is evaluated (search for `UpValues` of the form `pair[___,element,___]`), but not if `pair` is `HoldAllComplete`. – Leonid Shifrin Feb 24 '11 at 01:08
@acl: This was known as an "in-place" list modification problem in version 3, but seems to have been optimized in later versions (David Wagner has a discussion on this in his book, Chapter 7, p.211). My guess was that the optimization for in-place list (expression) modifications did not cover (or covered incompletely) the case of local variables generated by `Module`. As I said, I may well be wrong, this is just a guess. – Leonid Shifrin Feb 24 '11 at 01:12

Dr. belisarius · Answer 3 · 2011-02-24T12:48:03.660

Seems it's related to the Module local symbols memory management.

I'll show the timing series from some runs. Each run gives of course a unique Plot, but I checked "consistency" among runs. Look:

whileLength[l2_pair] := 
  Module[{result = 0}, current = l2; 
   While[! emptyQ@current, current = current[[2]];
    ++result];
   result];

gives the following Timing series:

While using only Global symbols:

whileLength[l2_pair] := 
  Module[{}, result = 0; current = l2; 
   While[! emptyQ@current, current = current[[2]];
    ++result];
   result];

gives:

Mathematica "linked lists" and performance

3 Answers3

Linked