15

I am coding up an algorithm for constructing a suffix tree in Mathematica based on Ukkonen's algorithm.

The question I have is, will passing my entire tree structure (which I have stored in a list) to a function to search through, cost my program a lot of memory and time since I have to use some of the functions multiple times in the algorithm?

For example, I have a function that searches for the children of a specific node, and I use the Select function to search the entire tree.

getChildren[parentID_] := Select[tree, #[[3]] == parentID &];

However I need to access the tree, so is it reasonable to pass the entire tree structure to the function? Since it doesn't seem that there is a way to make a variable global to the entire notebook. Or is there some alternate way to get around this?

Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
Steve
  • 1,105
  • 7
  • 20
  • 4
    do you mean by "there is no way to make a variable global to the entire notebook"? if you define `tree=5` then `tree` is 5 everywhere in the `Global` context (which is the default). it's global by default, unless you're after some other behaviour. – acl Nov 30 '11 at 11:08

2 Answers2

25

No, it does not cost extra memory to pass expressions. As is usual in functional languages, Mathematica objects are immutable: they cannot be modified, instead a new object is created when you transform them using some function. This also means that if you don't transform them, they're not copied, no matter how much you pass them around between functions.


From a user perspective, Mathematica expressions are trees, but I believe that internally they're stored as directed acyclic graphs, i.e. the same subexpression may be stored only once in memory, regardless of how many times it appears in the full expression (see e.g. the doc page of Share[]).

Here's an example to illustrate:

First, make sure In/Out don't take up extra memory:

In[1]:= $HistoryLength = 0;

Check memory usage:

In[2]:= MemoryInUse[]
Out[2]= 13421756

Let's make an expression that takes up a noticeable amount of memory:

In[3]:= s = f@Range[1000000];

In[4]:= MemoryInUse[]
Out[4]= 17430260

Now repeat this expression a hundred times ...

In[5]:= t = ConstantArray[s, 100];

... and notice that memory usage barely increases:

In[6]:= MemoryInUse[]
Out[6]= 18264676

ByeCount[] is misleading because it doesn't report the actual physical memory used, but the memory that would be used if common subexpressions weren't allowed to share the same memory:

In[7]:= ByteCount[t]
Out[7]= 400018040

An interesting point to note: if you remove f[...] from s, and make both s and t a plain numerical array, then this memory sharing will not happen, and memory usage will jump to ~400 MB.


Whether you make tree a global variable or an argument of getChildren, it will not make a difference in memory usage.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • 4
    Excellent answer! To further confirm the picture you described, one can do an assignment to some element in `t`, for example like so : `t[[1, 1, 1]] = 2; `, and notice an immediate jump in memory consumption corresponding to the size of a single `s` instance. The reason for what you observed with the numerical array is subtle: this is *only* because `Range` produces packed arrays, *and* you used `ConstantArray` (and not e.g. `Table`). The point is, `ConstantArray` will produce a packed array out of a packed array, and you can not share memory in a large packed array, since it is ... – Leonid Shifrin Nov 30 '11 at 11:47
  • 1
    ... based on direct C memory layout. Should you have used either ``ur = Developer`FromPackedArray[Range[1000000]]; t = ConstantArray[ur, {100}]`` or `r = Range[1000000];t = Table[r, {100}];`, and you would have observed the same memory sharing, since the result is not packed (meaning that there are intermediate pointers, and sharing is possible - or at least this is my picture of this at the moment). – Leonid Shifrin Nov 30 '11 at 11:51
  • I would slightly change the statement on immutability to something like: "they can not be modified, unless they are stored in a variable" - Mathematica is not a pure functional language, and mutability is possible. – Leonid Shifrin Nov 30 '11 at 17:04
  • @LeonidShifrin @Szabolcs My test shows `ConstantArray` also eats same amount of memory in the first example. See http://i.stack.imgur.com/Qakfh.png How to explain this? – user15964 Sep 03 '16 at 13:16
  • 1
    @user15964 I don't see the problem. In the first case, you construct a packed array, and wrap it into `f`. In the second case, you construct a constant array of references to `f[first-array]`. Perhaps, both pointers and integers take 8 bytes each, which is why the size of the memory use increase is the same. I don't have the time to dig deeper now. – Leonid Shifrin Sep 03 '16 at 13:23
  • @LeonidShifrin Oh, I am so stupid. I make a silly typos in `ConstantArray`, it should be 100, I write it 1000000. Thank you so much for answering ;-) – user15964 Sep 03 '16 at 13:28
4

Further to Szabolcs answers, if you do need to modify the data you may find this question on pass-by-reference useful:

simple question on passing data between functions

Community
  • 1
  • 1
Chris Degnen
  • 8,443
  • 2
  • 23
  • 40
  • 1
    Since you brought up this topic, let me mention that while the accepted answer to that question does work, I would generally advise against using `Unevaluated` in that manner, from the program design perspective: one should design functions self-contained, while there the function will only work if the user does not forget to wrap `Unevaluated` around the argument, and further, there is no way to catch this omission for the function itself. IMO, Hold-attributes are strongly preferred to `Unevaluated` in cases like that. – Leonid Shifrin Nov 30 '11 at 11:58
  • @ Leonid - I changed the link to refer to your answer. (That was my first intention anyway.) – Chris Degnen Nov 30 '11 at 16:45
  • Thanks, but my answer there is also not a general recommendation - the question there was constrained by the restrictions of CDF where explicit Hold attributes can not be attached to a symbol. Apparently, no one yet here on SO asked a simple question titled something like "Pass-by-reference in Mathematica" (or at least I am anaware of it), while having an SO discussion for such a topic seems very desirable. – Leonid Shifrin Nov 30 '11 at 16:55
  • @LeonidShifrin, I would like to ask if you have an updated version of your struct/object method shown in your post here http://groups.google.com/group/comp.soft-sys.math.mathematica/browse_thread/thread/b2ddf61da56b0be9/06849ed8cb17f590?pli=1 at the end of the thread. I like it, and thought to check if you have an updated version to use in a demo I am writing. If nothing changed, I will use the above code you posted. thanks – Nasser Dec 03 '11 at 02:00
  • @LeonidShifrin, Well, never mind. Your package posted in the link above works so well, and I really liked using it, but when I tested it inside a demonstration, many symbols used are not allowed in CDF. So unfortunately, I can't use it :( The symbols not allowed are----> "edit the notebook to remove the following illegal symbols: ClearAll, DownValues, Remove, SetAttributes, Symbol, Unprotect" ---> So I tried to edit your code to remove these symbols, but this broke things. It will be great if it possible to make one like this without these symbols, but that might not be possible. – Nasser Dec 03 '11 at 03:03
  • @Nasser It might be possible. I will have a look, whenever I get more time, hopefully soon. – Leonid Shifrin Dec 03 '11 at 09:24
  • @Nasser Do you, or anybody else, have a complete list of "illegal" symbols in CDF ? I found it kind of frustrating several times to readjust code for CDF to work and such a list might help. Or do I just not see in the "Documentation center" ? – Rolf Mertig Dec 03 '11 at 22:49
  • Here is the MathGroup link to Leonid's [struct/object method post](http://forums.wolfram.com/mathgroup/archive/2010/Oct/msg00374.html), in case the googlegroups link doesn't work. – Chris Degnen Dec 16 '15 at 12:18