9

in Lisp-like systems, cons is the normal way to PREPEND an element to a list. Functions that append to a list are much more expensive because they walk the list to the end and then replace the final null with a reference to the appended item. IOW (pseudoLisp)

(prepend list item) = (cons item list) = cheap!
(append list item) = (cond ((null? list) (cons item null))
                           (#t (cons (car list (append (cdr list) item)))))

Question is whether the situation is similar in Mathemtica? In most regards, Mathematica's lists seem to be singly-linked like lisp's lists, and, if so, we may presume that Append[list,item] is much more expensive than Prepend[list,item]. However, I wasn't able to find anything in the Mathematica documentation to address this question. If Mathematica's lists are doubly-linked or implemented more cleverly, say, in a heap or just maintaining a pointer-to-last, then insertion may have a completely different performance profile.

Any advice or experience would be appreciated.

Reb.Cabin
  • 5,426
  • 3
  • 35
  • 64

4 Answers4

21

Mathematica's lists are not singly linked lists like in Common Lisp. It is better to think of mathematica lists as array or vector like structures. The speed of insertion is O(n), but the speed of retrieval is constant.

Check out this page of Data structures and Efficient Algorithms in Mathematica which covers mathematica lists in further detail.

Additionally please check out this Stack Overflow question on linked lists and their performance in mathematica.

j.c.
  • 103
  • 1
  • 3
nixeagle
  • 1,002
  • 8
  • 17
  • 4
    +1. Just to add to this nice answer, the fact that Mathematica lists are implemented as arrays has very far-reaching consequences, affecting all aspects of programming in Mathematica, from programming styles (rule-based, functional, etc) to performance-tuning. In particuar, one should be well aware of that for writing any code where performance is important. – Leonid Shifrin Dec 09 '11 at 21:45
  • 1
    I think though it is unfortunate that the most important structure is missing in Mathematica, which is the struct (or record). This makes it hard to manage data in large programs, when one is passing many things between functions, as one can not arrange related data in records, instead of passing individual parameters. For small programs this does not matter. I do not see how one can design very large programs without having at least a record data structure. All the current solutions to emulate a struct in Mathematica are not working too well. – Nasser Dec 10 '11 at 04:14
  • 2
    @Nasser, here is your stuct: name[val11, val2,...] write selectors getVal1[a_name]:=a[[1]] etc and of you go. Mathematica is written in Mathematica to a large extend (roughly 1MLOC) I think that is big. –  Dec 10 '11 at 08:48
  • 1
    @rubenko, can one pass such a "structure" to compiled functions? – asim Dec 11 '11 at 13:40
  • @Nasser -- I use definitions like records; I think of them as hashtables or JavaScript objects. To model a key-value pair "inside" a variable r, do something like this r["a"]=1;r[2]="b"; Report all values with ??r. Retrieve a value like r["a"] or r[2]. Check for non-existence of a value for key foo with r[foo]===r[foo]. I've written a whole record-processing library (or k-v DB library, if you prefer) with this trick. Can round-trip with association lists (explicit lists of k-v pairs) easily. I'll post a link later. – Reb.Cabin Dec 11 '11 at 16:04
  • @Nasser -- a better way to check for non-existence is MatchQ[r[foo],r[_]]. This check is no good for certain cyclic structures, that is, where r[foo] actually DOES intentionally equal r[somethingElse], but it's good for most straightforward data scenarios. – Reb.Cabin Dec 11 '11 at 20:35
  • @Nasser -- here's a push of my "records" library: https://github.com/rebcabin/Mathematica-MiniData – Reb.Cabin Dec 16 '11 at 21:44
9

As a small add on, here is an efficient alternative to "AppendTo" in M-

myBag = Internal`Bag[]
Do[Internal`StuffBag[myBag, i], {i, 10}]
Internal`BagPart[myBag, All]
  • Is there any way to implement a stack using `Bag[]`? That is, can we `pop` a Bag? (sounds funny...) – Mr.Wizard Dec 10 '11 at 04:45
  • @ruebenko -- How can I inspect some of the other definitions in the "Internal" namespace? ??Internal and ??Internal` didn't work :) – Reb.Cabin Dec 11 '11 at 16:09
  • To the best of my knowledge that is not possible. –  Dec 11 '11 at 16:25
  • 1
    @Reb.Cabin, you want to use ``?Internal`*`` which generates a list of everything in the `Context`. But, they're undocumented, so you'll only find a list of them, not what they can do. – rcollyer Dec 12 '11 at 15:37
  • And most of the stuff is irrelevant it is just a few that actually do have an outside usage. –  Dec 12 '11 at 15:58
8

Since, as already mentioned, Mathematica lists are implemented as arrays, operations like Append and Prepend cause the list to be copied every time an element is added. A more efficient method is to preallocate a list and fill it, however my experiment below didn't show as great a difference as I expected. Better still, apparently, is the linked-list method, which I shall have to investigate.

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   appendlist = startlist;
   appendtime = 
    First[AbsoluteTiming[AppendTo[appendlist, #] & /@ datalist]];
   preallocatedlist = Join[startlist, Table[Null, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, appendtime}, {n, preallocatedtime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Appending", "Preallocating"}, 
 LegendPosition -> {1, 0}]

Timing chart comparing AppendTo against preallocating. (Run time: 82 seconds)

timing chart

Edit

Using nixeagle's suggested modification improved the preallocation timing considerably, i.e. with preallocatedlist = Join[startlist, ConstantArray[0, {Length[datalist]}]];

enter image description here

Second Edit

A linked-list of the form {{{startlist},data1},data2} works even better, and has the great advantage that the size does not need to be known in advance, as it does for preallocating.

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   linkedlisttime = 
    First[AbsoluteTiming[
      Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
        Length[datalist]}];
      linkedlist = Flatten[linkinglist];]];
   preallocatedlist = 
    Join[startlist, ConstantArray[0, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, preallocatedtime}, {n, linkedlisttime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Preallocating", "Linked-List"}, 
 LegendPosition -> {1, 0}]

Timing comparison of linked-list vs preallocating. (Run time: 6 seconds)

enter image description here

Chris Degnen
  • 8,443
  • 2
  • 23
  • 40
  • 5
    At least on *Mathematica* 8 I get a *vast* performance improvement by changing `Table[Null, {Length[datalist]}]` to `ConstantArray[0, {Length[datalist]}]` The resulting behavior is at least an order of magnitude faster for large inputs and mirrors the expected algorithmic complexity. – nixeagle Dec 09 '11 at 22:55
  • 3
    The improvement comes from replacing Null with 0. preallocatedlist = Join[startlist, Table[0, {Length[datalist]}]] gives the same performance improvement – asim Dec 10 '11 at 00:34
  • Both Table `Null` and Table `0` are quite fast in v7, much faster than shown in the first plot above. ConstantArray is a bit faster still. It appears there is a large slowdown introduced in v8 regarding `Table[Null, {x}]`. – Mr.Wizard Dec 10 '11 at 04:58
  • The above could still be improved on. You are mixing types. startlist is integer while data list is real. You could add a Developer`ToPackedArray@Flatten[linkinglist] for subsequent joy. Also note that the preallocatedlist is not packed. coun-- is not needed. use Do[bla;bla,{count, Length[],1}]. It were also nice to compare this to the Bag shown above. –  Dec 10 '11 at 09:26
  • One further comment - all to offten I see code that uses some sort of appending. There are of course situations where this is the right thing to do. I find it useful to think about the problem for a little and see if the size can be pre computed and the do a vectorized Set a la data=ConstantArray[0,{..}] data[[pos]]=vals. –  Dec 10 '11 at 09:28
  • @Chris - It's a little stunning that the linkinglist solution is the fastest -- explicit appending (avoiding the Append function) followed by a gigantic Flatten! Great to know, however! BTW, I reproduced all your results. – Reb.Cabin Dec 11 '11 at 17:04
  • I think I see what's going on: the "nested list" in Mathematica is actually a simulation of LISP's cons cells, flipped! It's a chained structure of 2-arrays with CDRs at the left and CARs at the right, i.e. {{{{},1},2},3}. Adding to the ENDs of these is efficient for the same reason that adding to the BEGINNING of LISP lists is efficient. Then just flatten! – Reb.Cabin Dec 11 '11 at 18:26
8

If you know how many elements your result will have and if you can calculate your elements, then the whole Append, AppendTo, Linked-List, etc is not necessary. In the speed-test of Chris, the preallocation only works, because he knows the number of elements in advance. The access operation to datelist stands for the virtual calculation of the current element.

If the situation is like that, I would never use such an approach. A simple Table combined with a Join is the hell faster. Let me reuse Chris' code: I add the preallocation to the time measurement, because when using Append or the linked list, the memory allocation is measured too. Furthermore, I really use the resulting lists and check wether they are equal, because a clever interpreter maybe would recognize simple, useless commands an optimize these out.

Needs["PlotLegends`"]
test[n_] := Module[{
    startlist = Range[1000],
    datalist, joinResult, linkedResult, linkinglist, linkedlist, 
    preallocatedlist, linkedlisttime, preallocatedtime, count, 
    joinTime, preallocResult},


   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   {linkedlisttime, linkedResult} = 
    AbsoluteTiming[
     Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
       Length[datalist]}];
     linkedlist = Flatten[linkinglist]
     ];

   count = -1;
   preallocatedtime = First@AbsoluteTiming[
      (preallocatedlist = 
        Join[startlist, ConstantArray[0, {Length[datalist]}]];
       Do[preallocatedlist[[count]] = datalist[[count]];
        count--, {Length[datalist]}]
       )
      ];

   {joinTime, joinResult} =
    AbsoluteTiming[
     Join[startlist, 
      Table[datalist[[i]], {i, 1, Length[datalist]}]]];
   PrintTemporary[
    Equal @@@ Tuples[{linkedResult, preallocatedlist, joinResult}, 2]];
   {preallocatedtime, linkedlisttime, joinTime}];

results = test[#] & /@ Range[40];
ListLinePlot[Transpose[results], PlotStyle -> {Black, Gray, Red}, 
 PlotLegend -> {"Prealloc", "Linked", "Joined"}, 
 LegendPosition -> {1, 0}]

enter image description here

In my opinion, the interesting situations are, when you don't know the number of elements in advance and you have to decide ad hoc whether or not you have to append/prepend something. In those cases Reap[] and Sow[] maybe worth a look. In general I would say, AppendTo is evil and before using it, have a look at the alternatives:

n = 10.^5 - 1;
res1 = {};
t1 = First@AbsoluteTiming@Table[With[{y = Sin[x]},
      If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
     ];

{t2, res2} = AbsoluteTiming[With[{r = Release@Table[
        With[{y = Sin[x]},
         If[y > 0, y, Hold@Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
    r]];

{t3, res3} = AbsoluteTiming[Flatten@Table[
     With[{y = Sin[x]},
      If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];

{t4, res4} = AbsoluteTiming[First@Last@Reap@Table[With[{y = Sin[x]},
        If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];

{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}

Gives {5.151575, 0.250336, 0.128624, 0.148084}. The construct

Flatten@Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]

is luckily really readable and fast.

Remark

Be careful trying this last example at home. Here, on my Ubuntu 64bit and Mma 8.0.4 the AppendTo with n=10^5 takes 10GB of Memory. n=10^6 takes all of my RAM which is 32GB to create an array containing 15MB of data. Funny.

halirutan
  • 4,281
  • 18
  • 44