8

I have a function that's taking a long time to run. When I profile it, I find that over half the time (26 out of 50 seconds) is not accounted for in the line by line timing breakdown, and I can show that the time is spent after the function finishes running but before it returns control by the following method:

ts1 = tic;
disp ('calling function');
functionCall(args);
disp (['control returned to caller - ', num2str(toc(ts1))]); 

The first line of the function I call is ts2 = tic, and the last line is

disp (['last line of function- ', num2str(toc(ts2))]);

The result is

calling function

last line of function - 24.0043

control returned to caller - 49.857

Poking around on the interwebs, I think this is a symptom of the way MATLAB manages memory. It deallocates on function returns, and sometimes this takes a long time. The function does allocate some large (~1 million element) arrays. It also works with handles, but does not create any new handle objects or store handles explicitly. My questions are:

  1. Is this definitely a memory management problem?
  2. Is there any systematic way to diagnose what causes a problem in this function, as opposed to others which return quickly?
  3. Are there general tips for reducing the amount of time MATLAB spends cleaning up on a function exit?
Community
  • 1
  • 1
Marc
  • 5,315
  • 5
  • 30
  • 36
  • 2
    No MathWorks staff member has cared to answer this question so far. Thus they silently acknowledge that it is a fundamental MATLAB design flaw. – Mikhail Poda Nov 26 '10 at 18:22

3 Answers3

4

You are right, it seems to be the time spent on garbage collection. I am afraid it is a fundamental MATLAB flaw, it is known since years but MathWorks has not solved it even in the newest MATLAB version 2010b.

You could try setting variables manually to [] before leaving function - i.e. doing garbage collection manually. This technique also helps against memory leaks in previous MATLAB versions. Now MATLAB will spent time not on end but on myVar=[];

You could alleviate problem working without any kind of references - anonymous functions, nested functions, handle classes, not using cellfun and arrayfun.

If you have arrived to the "performance barrier" of MATLAB then maybe you should simply change the environment. I do not see any sense anyway starting today a new project in MATLAB except if you are using SIMULINK. Python rocks for technical computing and with C# you can also do many things MATLAB does using free libraries. And both are real programming languages and are free, unlike MATLAB.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Mikhail Poda
  • 5,742
  • 3
  • 39
  • 52
  • Is myVar = [] superior to clear myVar? I've tried clearning all variables except the function return (using whos to determine the variable names in the workspace) before exiting, but it doesn't help. – Marc Nov 24 '10 at 18:13
  • I do not know, try it and post your findings. Which MATLAB version are you using? – Mikhail Poda Nov 24 '10 at 18:29
2

I discovered a fix to my specific problem that may be applicable in general.

The function that was taking a long time to exit was called on a basic object that contained a vector of handle objects. When I changed the definition of the basic object to extend handle, I eliminated the lag on the close of the function.

What I believe was happening is this: When I passed the basic object to my function, it created a copy of that object (MATLAB is pass by value by default). This doesn't take a lot of time, but when the function exited, it destroyed the object copy, which caused it to look through the vector of handle objects to make sure there weren't any orphans that needed to be cleaned up. I believe it is this operation that was taking MATLAB a long time.

When I changed the object I was passing to a handle, no copy was made in the function workspace, so no cleanup of the object was required at the end.

This suggests a general rule to me:

If a function is taking a long time to clean up its workspace on exiting and you are passing a lot of data or complex structures by value, try encapsulating the arguments to that function in a handle object

This will avoid duplication and hence time consuming cleanup on exit. The downside is that your function can now unexpectedly change your inputs, because MATLAB doesn't have the ability to declare an argument const, as in c++.

Marc
  • 5,315
  • 5
  • 30
  • 36
  • MATLAB is not pass-by-value by default, see this post: http://blogs.mathworks.com/loren/2006/05/10/memory-management-for-functions-and-variables/. But apparently this rule is not being applied for value objects. – Mikhail Poda Dec 04 '10 at 08:32
  • MATLAB's behavior is pass-by-value by default. So if there is a vector of hadle objects, and you make a change to that vector in a function, MATLAB needs to deal with the consequences of that change on exit, even if the internal implementation did not make a copy of that vector. Likely, this means that when the function is called, the reference counter on the objects is incremented by one, and when the function exits, the reference counter decrements by one. This should be fast, but probably some other part of the matlab scheme, especially with nested and recursive objects isn't. – Marc Dec 06 '10 at 13:48
0

A simple fix could be this: pre-allocate the large arrays and pass them as args to your functionCall(). This moves the deallocation issue back to the caller of functionCall(), but it could be that you are calling functionCall more often than its parent, in which case this will speed up your code.

workArr = zeros(1,1e6); % allocate once
...
functionCall(args,workArr); % call with extra argument
...
functionCall(args,wokrArr); % call again, no realloc of workArr needed
...

Inside functionCall you can take care of initializing and/or re-setting workArr, for instance

[workArr(:)] = 0; % reset work array
ric0liva
  • 301
  • 2
  • 4
  • 2
    This is not likely to help in most situations. As soon as functionCall modifies workArr, MATLAB will create a copy of workArr to store the changes made in functionCall. The result is that memory consumption is doubled and time is wasted perfoming the copy. If workArr is a handle object, this approach _could_ be an improvement. Benchmarking is certainly required in this case. – Arthur Ward Mar 24 '11 at 19:21