1

I have created a monster, or at least a lot of MATLAB handle classes that point to each other. For instance, an experiment (handle) is a set of tracks (handles) which contain runs (handle) and reorientations (handle). Then the tracks point pack to the experiment that contains them, the runs and reorientations point back to the track they came from, and they also point ahead and behind to the next run & reorientation.

I have come to realize that all this cross pointing around may confuse MATLAB when it comes time to load or save files, so as much as I can I've defined the handles as Transient and used the set.property methods to define the back pointers. For instance

Track < Handle
   properties(Transient = true)
      expt;
   end
end

Experiment
   properties(AbortSet = true)
      track;
   end
   methods 
      function set.track(obj, value)
          if (~isempty(value) && isa(value, 'Track'))
              value.expt = obj;
          end
          obj.track = value;
   end
end

This seems to have sped up loading from disk somewhat, but I think I am still missing things.

I can save an experiment to disk, creating a 48 MB file, in about 7 seconds. But it then takes 3 minutes to load the file from disk. I have tried to use the profiler to locate the slow points, but it reports a total time of ~50 milliseconds.

Questions:

Does anyone have experience with saving handle objects to disk and can recommend general practices to speed up loading?

Is there any way to get the profiler to report what matlab is doing with the other 179.95 seconds or a systematic way to determine what is slowing down the loading without using the profiler?

Marc
  • 5,315
  • 5
  • 30
  • 36
  • You forgot to attach the code you're using to save and load the data. Without knowing this code it's impossible to give you a worthwhile answer. – Yair Altman Jul 02 '10 at 14:48
  • I'm saving the files using the standard matlab save and load commands to save to .mat files. – Marc Jul 03 '10 at 03:30
  • Handle objects generally have terrible performance, see here: http://stackoverflow.com/questions/1446281/matlabs-garbage-collector/1489751#1489751 – Mikhail Poda Aug 29 '10 at 07:14
  • This post relates to memory deallocation problems associated especially with nested handles. I'm not sure that this is the root cause of the problem with loading objects from disk. I think the fundamental cause of my problem was that I organized my data on c++/java lines, with small objects being the fundamental units, organized into arrays of objects. MATLAB is much faster when dealing with arrays of data. My provisional solution has been to declare as transitory as many fields as I can, and then recompute them on loading, which is much faster. – Marc Aug 30 '10 at 14:49
  • MATLAB tries to deallocate memory (runs garbage collector) on each function call therefore both forms of references (handle objects and nested functions) have worse performance then value objects. I'm also not sure that this is the root cause of the problem with loading objects from disk. – Mikhail Poda Sep 01 '10 at 17:52

3 Answers3

2

I do not save handle objects to disk. Instead, I have custom save/load methods that copy the information in the handle objects to structures for saving, from which I construct the objects and their dependencies on loading.

Thus, loading is reasonably fast, and I can have a patch method that allows me to update the structure (or some of the data contained therein) before I send it to the class constructor.

For the profiler issue: I guess MATLAB is showing this time as 'overhead' somewhere. It is very difficult to track that down in my experience.

Jonas
  • 74,690
  • 10
  • 137
  • 177
  • I think MATLAB supports the conversion to structure with the "saveobj" and "loadobj" methods you can define for each class. The problem I could see is I don't want to have to write these methods for every subclass. – Marc Jul 03 '10 at 03:33
  • That's why I created save and load methods in my superclass, which are inherited. If some properties should be handled differently from others, you can either write your methods so that they recognize something different about the methods, or you have a hidden property in each subclass that lists the 'special' properties. – Jonas Jul 03 '10 at 12:32
  • but how do you deal with calling the constructor in the loadobj method? E.g. bar < foo. If I define loadobj only in foo, when I try to load a bar from disk, won't I end up with a foo? – Marc Jul 05 '10 at 19:02
  • 1
    Again, I don't use loadobj methods. My load method looks like this: 1. load the structure. 2. `constructor = str2fun(loadedStruct.class);`, 3. `obj = constructor(loadedStruct)`. In other words, the load method of `foo` loads the structure that contains the information about a `bar`, including a `class`-field with the class name "bar", and then the load-method of `foo` calls the constructor of `bar` with the structure as input. – Jonas Jul 06 '10 at 01:42
  • Clever. Actually, I think this will also work in the saveobj/loadobj paradigm, as there's no reason saveobj can't save the classname as well. – Marc Jul 06 '10 at 14:14
2

I haven't worked with handle objects, but in general, there is per-mxarray overhead in saving and loading, so optimizing MAT files is a matter of converting the data in them to a form with fewer mxarrays. An mxarray is a single-level array structure. For example:

strs = {'foo', 'bar', 'baz'};

The strs array contains 4 mxarrays: one cell array and 3 char arrays.

To speed up saving and loading, try doing this when saving, and the inverse when loading. - Convert cellstr to 2-D char - Convert record-organized structs and objects to planar-organized - Eliminate redundant objects by storing a canonical set of values in one array and replacing object instances with indexes in to that array. (This is probably not relevant for handles, which inherently behave this way.)

"Record-organized" means an array of N things is represented as an N-long array of structs with scalar fields; "planar-organized" means it's represented as a scalar struct containing N-long arrays in its fields.

See if you can convert your in-memory object graph to a normalized form that fits in a few large primitive arrays, similar to how you might store it in SQL. The object properties for all the objects in one set of arrays, and the handle relationships as (id, id) tuples held in numeric arrays, maybe using indexes into the property arrays as your object ids.

A saveobj and loadobj defined at the "top" class in your object graph could do the conversion.

Also, if you are using network file systems, try doing your saving and loading on a local filesystem with temporary copies. For reading, copy the MAT file to tempdir and then load() from there; for writing, save() to tempdir and then copy it to the network drive. In my experience, save() and load() are substantially faster with local I/O, enough that it's a big net win (2x-3x speedup) even with the time to do the copies. Use tempname() to pick temp files.

With the profiler, are you using the "-timer real" option? By default, "profile" shows CPU time, and this is I/O-centric stuff. With "-timer real", you should see those other 180 seconds of wall time attributed to save() and load(). Unfortunately, since they're builtins, the profiler won't let you see inside them, and that might not help much.

Andrew Janke
  • 23,508
  • 5
  • 56
  • 85
  • thanks for the tips. I tried running the profiler with the -timer real flag, and get the same results, because the profiler ignores top level built-ins. So I put the load command in an anonymous function & ran it again. As you predicted, I get the helpful result that "Self time (built-ins, overhead, etc.)" account for 100% of the load time. – Marc Jul 06 '10 at 23:06
0

Have you tried the different options to SAVE such as -v7.3? I believe that there are some differences when using that format.

Edric
  • 23,676
  • 2
  • 38
  • 40
  • I think at one point I had the default setting to 7.3 and it took long enough to save that I killed MATLAB using the task manager. There's no difference between the other two formats (compressed & uncompressed) in save or load time. – Marc Jul 03 '10 at 03:35