3

Here is my problem (big picture). I have a project which uses large and complicated (by which I mean contains multiple levels of nested structures) Matlab structures. This is predictably slow (especially when trying to load / save). I am attempting to improve runtimes by converting some of these structures into Java Objects. The catch is that the data in these Matlab structures is accessed in a LOT of places, so anything requiring a rewrite of access syntax would be prohibitive. Hence, I need the Java Objects to mimic as closely as possible the behavior of Matlab structures, particularly when it comes to accessing the values stored within them (the values are only set in one place, so the lack of operator overloading in java for setting isn't a factor for consideration).

The problem (small picture) that I am encountering lies with accessing data from an array of these structures. For example,

person(1)
  .age = 20
  .name
    .first = 'John'
    .last = 'Smith

person(2)
  .age = 25
  .name
    .first = 'Jane'
    .last = 'Doe'

Matlab will allow you to do the following,

>>age = [person(1:2).age]
age = 
    20    25

Attempting to accomplish the same with Java,

>>jperson = javaArray('myMatlab.Person', 2);
>>jperson(1) = Person(20, Name('John', 'Smith'));
>>jperson(2) = Person(25, Name('Jane', 'Doe'));
>>age = [jperson(1:2).age]
??? No appropriate method or public field age for class myMatlab.Person[]

Is there any way that I can get the Java object to mimic this behavior? The first thought I had was to simply extend the Person[] class, but this doesn't appear to be possible because it is final. My second approach was to create a wrapper class containing an ArrayList of Person, however I don't believe this will work either because calling

wrappedPerson(1:2)

would either be interpreted as a constructor call to a wrappedPerson class or an attempt to access elements of a non-existent array of WrappedPerson (since java won't let me override a "()" operator). Any insight would be greatly appreciated.

The code I am using for my java class is

public class Person {
  int _age;
  ArrayList<Name> _names;

  public Person(int age, Name name) {
    _age = age;
    _names.add(name);
  }

  public int age() {return _age;}
  public void age(int age) {_age = age;}
  public Name[] name() {return _names.toArray(new Name[0]);}
  public void name(Name name) { _names.add(name);}
}

public class Name {
  String _first;
  String _last;

  public Name(String first, String last) {
    _first = first;
    _last = last;
  }

  public int first() {return _first;}
  public void first(String firstName) {_first = firstName;}
  public int last() {return _last;}
  public void last(String lastName) {_last = lastName;}
}
Community
  • 1
  • 1
PaxRomana99
  • 564
  • 1
  • 5
  • 12
  • How were you expecting to pass in 1:2, considering : isn't a Java operator? – Garrett Hall Mar 20 '13 at 17:15
  • Matlab automatically handles that conversion. If you had an array of Person of size 4, person = new Person[4], you could call person(1:2) in Matlab and it would return the first two elements of the array as an array of Person size 2. – PaxRomana99 Mar 20 '13 at 17:21
  • Waitaminnit... does this `[foo(1:3).bar.baz]` syntax even work with normal Matlab structs, or does it throw a `()-indexing must appear last in an index expression` error? – Andrew Janke Mar 20 '13 at 22:42
  • Depends on what "bar" is. If it is an array of structs, then you would have to do that in two steps, {a = [foo(1:3).bar]; a = [a(:).baz]} – PaxRomana99 Mar 21 '13 at 12:28

1 Answers1

6

TL;DR: It's possible, with some fancy OOP M-code trickery. Altering the behavior of () and . can be done with a Matlab wrapper class that defines subsref on top of your Java wrapper classes. But because of the inherent Matlab-to-Java overhead, it probably won't end up being any faster than normal Matlab code, just a lot more complicated and fussy. Unless you move the logic in to Java as well, this approach probably won't speed things up for you.

I apologize in advance for being long-winded.

Before you go whole hog on this, you might benchmark the performance of Java structures as called from your Matlab code. While Java field access and method calls are much faster on their own than Matlab ones, there is substantial overhead to calling them from M-code, so unless you push a lot of the logic down in to Java as well, you might well end up with a net loss in speed. Every time you cross the M-code to Java layer, you pay. Have a look at the benchmark over at this answer: Is MATLAB OOP slow or am I doing something wrong? to get an idea of scale. (Full disclosure: that's one of my answers.) It doesn't include Java field access, but it's probably on the order of method calls due to the autoboxing overhead. And if you are coding Java classes as in your example, with getter and setter methods instead instead of public fields (that is, in "good" Java style), then you will be incurring the cost of Java method calls with each access, and it's going to be bad compared to pure Matlab structures.

All that said, if you wanted to make that x = [foo(1:2).bar] syntax work inside M-code where foo is a Java array, it would basically be possible. The () and . are both evaluated in Matlab before calling to Java. What you could do is define your own custom JavaArrayWrapper class in Matlab OOP corresponding to your Java array wrapper class, and wrap your (possibly wrapped) Java arrays in that. Have it override subsref and subsasgn to handle both () and .. For (), do normal subsetting of the array, returning it wrapped in a JavaArrayWrapper. For the . case:

  • If the wrapped object is scalar, invoke the Java method as normal.
  • If the wrapped object is an array, loop over it, invoke the Java method on each element, and collect the results. If the results are Java objects, return them wrapped in a JavaArrayWrapper.

But. Due to the overhead of crossing the Matlab/Java barrier, this would be slow, probably an order of magnitude slower than pure Matlab code.

To get it to work at speed, you could provide a corresponding custom Java class that wraps Java arrays and uses the Java Reflection API to extract the property of each selected array member object and collect them in an array. The key is that when you do a "chained" reference in Matlab like x = foo(1:3).a.b.c and foo is an object, it doesn't do a stepwise evaluation where it evaluates foo(1:3), and then calls .a on the result, and so on. It actually parses the entire (1:3).a.b.c reference, turns that in to a structured argument, and passes the entire thing in to the subsref method of foo, which has responsibility for interpreting the entire chain. The implicit call looks something like this.

x = subsref(foo, [ struct('type','()','subs',{{[1 2 3]}}), ...
                   struct('type','.', 'subs','a'), ...
                   struct('type','.', 'subs','b'), ... 
                   struct('type','.', 'subs','c') ] )

So, given that you have access to the entire reference "chain" up front, if foo was a M-code wrapper class that defined subsasgn, you could convert that entire reference to a Java argument and pass it in a single method call to your Java wrapper class which then used Java Reflection to dynamically go through the wrapped array, select the reference elements, and do the chained references, all inside the Java layer. E.g. it would call getNestedFields() in a Java class like this.

public class DynamicFieldAccessArrayWrapper {
    private ArrayList _wrappedArray;

    public Object getNestedFields(int[] selectedIndexes, String[] fieldPath) {
        // Pseudo-code:
        ArrayList result = new ArrayList();
        if (selectedIndexes == null) {
            selectedIndexes = 1:_wrappedArray.length();
        }
        for (ix in selectedIndexes) {
            Object obj = _wrappedArray.get(ix-1);
            Object val = obj;
            for (fieldName in fieldPath) {
                java.lang.reflect.Field field = val.getClass().getField(fieldName);
                val = field.getValue(val);
            }
            result.add(val);
        }
        return result.toArray(); // Return as array so Matlab can auto-unbox it; will need more type detection to get array type right
    }
}

Then your M-code wrapper class would examine the result and decide whether it was primitive-ish and should be returned as a Matlab array or comma-separated list (i.e. multiple argouts, which get collected with [...]), or should be wrapped in another JavaArrayWrapper M-code object.

The M-code wrapper class would look something like this.

classdef MyMJavaArrayWrapper < handle
    % Inherit from handle because Java objects are reference-y
    properties
        jWrappedArray  % holds a DynamicFieldAccessArrayWrapper
    end
    methods
        function varargout = subsref(obj, s)
            if isequal(s(1).type, '()')
                indices = s(1).subs;
                s(1) = [];
            else
                indices = [];
            end
            % TODO: check for unsupported indexing types in remaining s
            fieldNameChain = parseFieldNamesFromArgs(s);
            out = getNestedFields( jWrappedArray, indices, fieldNameChain );
            varargout = unpackResultsAndConvertIfNeeded(out);
        end
    end
end

The overhead involved in marshalling and unmarshalling the values for the subsasgn call would probably overwhelm any speed gain from the Java bits.

You could probably eliminate that overhead by replacing your M-code implementation of subsasgn with a MEX implementation that does the structure marshalling and unmarshalling in C, using JNI to build the Java objects, call getNestedFields, and convert the result to Matlab structures. This is way beyond what I could give an example for.

If this looks a bit horrifying to you, I totally agree. You're bumping up against the edges of the language here, and trying to extend the language (especially to provide new syntactic behavior) from userland is really hard. I wouldn't seriously do something like this in production code; just trying to outline the area of the problem you're looking around.

Are you dealing with homogeneous arrays of these deeply nested structures? Maybe it would be possible to convert them to "planar organized" structures, where instead of an array of structs with scalar fields, you have a scalar struct with array fields. Then you can do vectorized operations on them in pure M-code. This would make things a lot faster, especially with save and load, where the overhead scales per mxarray.

Community
  • 1
  • 1
Andrew Janke
  • 23,508
  • 5
  • 56
  • 85
  • First off, thank you very much for your reply. There is a lot of great information there that I will be processing. To answer a couple of your questions, 1) It is almost certainly worth crossing the Matlab / Java boundary in my case because there is a fair amount of parallel processing I can do on the Java side and I don't have the toolbox to do so on the Matlab side. Initial benchmarks indicated substantial savings which is why I have proceeded to this point. You raise an interesting question about field access vs method access. I'll check that out. – PaxRomana99 Mar 20 '13 at 20:59
  • 2) Unfortunately, I am stuck with the structure organization "as is". A quick followup question, would it be better to implement the M-code layer as MCOS or UDD? – PaxRomana99 Mar 20 '13 at 21:02
  • 1) Yeah, if you're doing nontrivial work on the Java side instead of just using it as a "dumb" data container, you could see a lot of improvement. Just try to define your Java interface so it takes large arguments, maximizing the ratio of data passed per method invocation to amortize the cost of crossing the boundary. – Andrew Janke Mar 20 '13 at 22:35
  • 2) Hard to say. If you're speed obsessed, UDD was faster as of a few versions ago, but I haven't benchmarked it lately. Try in your version. (See my linked answer.) But MCOS supports `handle` semantics, which is a better match for Java object behavior in some cases. If you need to support the `obj.method(arg1, arg2)` syntax, you're stuck with MCOS. But the `method(obj, arg1, arg2)` syntax is faster and allows you to switch between MCOS and UDD implementations, so you should prefer syntax if you have a choice at this point. Plus, MCOS is supported by Mathworks now, which may trump everything. – Andrew Janke Mar 20 '13 at 22:40
  • Hmmmm...it appears that field access may be faster than method access. I wrote a test class in java with a public double and a method to return that double. I then called it 10,000 times from Matlab using both access methods. The results were the field access took 0.3297 s to run and the method access took 0.6328 s to run. – PaxRomana99 Mar 21 '13 at 13:14