8

I have complex code blocks, in a Matlab script, that act on large, non-sparse arrays. The code performs many write operations to random elements in the arrays, as well as read operations. The identical code must execute against different (large) arrays (i.e., the same code blocks, except for different array variable names).

I do not want to have long, duplicated code blocks that differ only in the array names.

Unfortunately, when I create a function to perform the operations, so that the code block appears only once, the performance slows down by a factor of 10 or more (presumably due to the copying of the array). However, I do not need the array copied. I would prefer to "pass by reference", so that the purpose of the function call is ONLY to avoid having duplicated code blocks. There seems to be no way to avoid the copy-on-write semantics, however.

Also, it is impossible (so far as I understand) to create a script (not a function) to achieve this, because the script must contain identical variable names as the calling script, so I would need a different script for every array on which I wish to run the script, which gains nothing (I still would have duplicated code blocks).

I have looked into creating an alias variable name to "substitute" for the array variable name of interest, in which case I could call a script and avoid duplicated code. However, I cannot find any way to create an alias in Matlab.

Finally, I have attempted writing a function that utilizes the evalin() function, and passing the string name of the array variable to this function, but although this works, the performance is also vastly slower - about the same as passing the arrays by value to a function (at least a 10 times decay in performance).

I am coming to the conclusion that it is impossible in Matlab to avoid duplicating code blocks when performing complex operations on non-sparse arrays, in the effort to avoid the ghastly overhead that Matlab seems to present using any possible technique of avoiding duplicated code blocks.

I find this hard to believe, but I cannot find a way around it.

Does anybody know of a way to avoid duplicated code blocks when performing identical intricate operations on multiple non-sparse arrays in Matlab?

Dan Nissenbaum
  • 13,558
  • 21
  • 105
  • 181
  • [This answer](http://stackoverflow.com/a/3427461/855026) might help. Matlab uses "copy on write" semantics, so matrices are in fact passed "by reference" to functions. I'm afraid I don't know why you are experiencing such a dramatic drop in performance though. Does your function change values in the input matrices? – Brian L Oct 25 '12 at 22:59
  • As I noted, I am performing many write operations to random elements in the arrays. I understand that this is why Matlab is, sadly, triggering "copy on write" in extensive fashion in my case. The reason for the dramatic drop in performance seems clear: Large, non-sparse arrays are being frequently (and unnecessarily) copied. The copies are unnecessary, as I want the writes to act on the *original* arrays. However, as noted, I cannot find any way to "pass by reference" *even* in the case of write semantics (i.e., I do *not* want "copy on write" semantics). – Dan Nissenbaum Oct 25 '12 at 23:03
  • I am, indeed, giving the input and output variables the same name. Are you absolutely certain that this completely prevents copy-on-write semantics? I am seeing a tremendous decrease in performance for *identical* code (not a single line changed), just by wrapping the code in a function call (AND being certain to use the same variable names within the function for the input and output variables). – Dan Nissenbaum Oct 25 '12 at 23:05
  • 1
    The "handle class" referred to in the answer I linked above (or perhaps just a class in general) is probably what you need. Try: http://www.mathworks.com.au/help/matlab/ref/handle.html – Brian L Oct 25 '12 at 23:07
  • @BrianL - Fabulous - Handles seem the way to go. I will figure out the syntax. However, if you have the syntax down, and feel like posting a brief answer that simply creates an array, and then creates a handle containing (or referring to) that array, that would be helpful. – Dan Nissenbaum Oct 25 '12 at 23:10

5 Answers5

9

As noted by Loren on his blog, MATLAB does support in-line operations on matrices, which essentially covers passing arrays by reference, modifying them in a function, and returning the result. You seem to know that, but you erroneously state that because the script must contain identical variable names as the calling script. Here is code example that shows this is wrong. When testing, please copy it verbatim and save as a function:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc
end

function x=computeIP(x)
x = x+1;
end

function y=compute(x)
y = x+1;
end

Time results on my computer:

Elapsed time is 0.243335 seconds.
Elapsed time is 0.251495 seconds.
Elapsed time is 0.090949 seconds.
Elapsed time is 0.088894 seconds.
Elapsed time is 0.090638 seconds.

As you see, the two last calls that use an in-place function are equally fast for both input arrays x and y. Also, they are equally fast as running x = x+1 without a function. The only important thing is that inside the function input and output parameters are the same. And there is one more thing...

If I should guess what is wrong with your code, I'd say you made nested functions that you expect to be in-place. And they are not. So the below code will not work:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc

    function x=computeIP(x)
        x = x+1;
    end

    function y=compute(x)
        y = x+1;
    end
end

Elapsed time is 0.247798 seconds.
Elapsed time is 0.257521 seconds.
Elapsed time is 0.229774 seconds.
Elapsed time is 0.237215 seconds.
Elapsed time is 0.090446 seconds.

The bottom line - be careful with those nested functions..

angainor
  • 11,760
  • 2
  • 36
  • 56
4

You may try to put all of your arrays into a single cell array and use index on it, instead of referring by names. Function will still copy the arrays, but script can do the job.

VBel
  • 241
  • 1
  • 3
  • This is a good idea, but unfortunately, although it gives the correct result, it does not prevent the decay in performance. Note that I have used a *script*, not a function, that contains the code of interest; as you suggest, I set an index parameter to the desired cell index containing my array, and I simply call the script over and over (but do not copy different arrays into the cell; I simply access the arrays in the various cells and then write values into these arrays). – Dan Nissenbaum Oct 26 '12 at 00:18
  • Unfortunately, it does, indeed, seem that Matlab may be fundamentally incapable of allowing operations on arrays without amateurish and dangerous duplication of code blocks. It's sad. – Dan Nissenbaum Oct 26 '12 at 00:19
2

Another answer:

There is a good article In-place Operations on Data. Apparently, there may be two pitfalls:

  1. (this is trivial and you probably did it) You should use the same in and out variable name not only in the definition of the function, but also where you call it.
  2. This only work if you call your function from ANOTHER FUNCTION, not from a command line. Weird... I tried, and, though there is an overhead, it is very small (for 10000-by-10000 arrays it was 1 sec from a command line and 0.000361 sec from another function).

If this does not work for you, you may use an undocumented feature that allows you do in-place operation in C++ MEX file. This is nasty, but here is an article just about that: Matlab mex in-place editing

BenMorel
  • 34,448
  • 50
  • 182
  • 322
VBel
  • 241
  • 1
  • 3
2

The handle solution suggested by Brian L does work although the first call that modifies the wrapped data does take a long time (because it has to make a copy of the original data).

Try this:

SomeData.m

classdef SomeData < handle
    properties        
            X
    end
    methods                
        function obj = SomeData(x)            
            if nargin > 0
                obj.X = x;
            else
                obj.X = [];
            end
        end
    end
end

LargeOp.m

function directArray = LargeOp( someData, directArray )
    if nargin > 1
        directArray(1,1) = rand(1);
    else
        someData.X(1,1) = rand(1);
        directArray = [];    
    end
end

Script to test performance

large = zeros(10000,10000);

data = SomeData(large);

tic
LargeOp(data);
toc

tic
large = LargeOp(data,large);
toc

tic
LargeOp(data);
toc

tic
large = LargeOp(data,large);
toc

Results

Elapsed time is 0.364589 seconds.
Elapsed time is 0.450668 seconds.
Elapsed time is 0.001073 seconds.
Elapsed time is 0.443150 seconds.
grantnz
  • 7,322
  • 1
  • 31
  • 38
1

Depending on your needs, you can accomplish this by making a nested function.

function A = evensarenegative(n)
    A = zeros(n,1);

    for i = 1:n
        if mod(i,2)
            nested1(i)
        else
            nested2(i)
        end
    end

    function nested1(i)
        A(i) = i;
    end

    function nested2(i)
        A(i) = -i;
    end
end

Here, the functions share the same workspace, in particular the A matrix, so no variables are ever copied. I find it to be a convenient way to organize code, especially when I have a lot of minor (but possibly verbose) operations as part of a larger workflow.

drhagen
  • 8,331
  • 8
  • 53
  • 82
  • However, the nested functions must use the *same variable name* as the function defined in the calling function. Is it possible to use nested functions where *different* variables are acted upon (in identical fashion) within the nested function - i.e., *as though* those variables were passed as arguments to the nested function? – Dan Nissenbaum Oct 25 '12 at 23:18
  • I cannot think of a way to do that off the top of my head. If that's a main goal, then you will have to go with @Brian's handle suggestion. – drhagen Oct 25 '12 at 23:23
  • Unfortunately, @BrianL's solution does not work. Using handles gives the correct result, but exhibits the same drastic decrease in performance. – Dan Nissenbaum Oct 26 '12 at 00:03