1

I have a Matlab program that reads in large amount of data from physical file on disk and perform intensive computation like this:

data = load('myfile.dat');
results = intensiveCompute(data);

The computation is done on GPU and takes a very long time. What I'd like to do is to be able to load in data from the next file while the computation is running (since loading file is also a bottleneck). From what I gather so far, this is doable using Mex (e.g. _beginthread etc...). However, if possible it would be ideal to stay within the Matlab environment. Perhaps there's some way to spawn a thread in Matlab to read data and another to perform computation. Any help is greatly appreciated.

user1715925
  • 607
  • 9
  • 26
  • wouldn't `parfor` give you this behavior? – Shai Oct 24 '13 at 21:02
  • maybe i'm missing something, but how would you use parfor to execute 2 different tasks in parallel? I've marked chappjc's solution as answer by the way but if there's some clever way of using parfor it'd be great to know. – user1715925 Oct 25 '13 at 18:30

2 Answers2

2

In this answer I detailed an approach using the task and job functions for asynchronous execution, but I think for a simple load that parfeval might be easiest. For example,

f = parfeval(@load,1,'myfile.dat'); % asynchronous, move on to intensiveCompute
results = intensiveCompute(data);
data = fetchOutputs(f); % Blocks until complete

Note: Be sure to allow incoming connections in the Windows firewall for MATLAB.exe, smpd.exe and mpiexec.exe. You should be prompted the fist time a pool is launched (automatically by parfeval).

Here's a simple example to show how it works:

>> x = magic(5);
>> save x.mat x
>> f = parfeval(@load,1,'x.mat');
Starting parallel pool (parpool) using the 'local' profile ... connected ...
>> f
f = 
 FevalFuture with properties: 

                   ID: 1
             Function: @load
                State: running
      ErrorIdentifier: 
         ErrorMessage: 

At this point, we see that the command is still running on the worker. Obviously, we can be doing something more useful than simply checking on the job... but here's what happens after a brief wait:

>> f
f = 
 FevalFuture with properties: 

                   ID: 1
             Function: @load
                State: finished (unread)
      ErrorIdentifier: 
         ErrorMessage: 
>> % all done, load the data
>> data = fetchOutputs(f) % Blocks until complete
data = 
x: [5x5 double]
Community
  • 1
  • 1
chappjc
  • 30,359
  • 6
  • 75
  • 132
1

I know you mentioned you want to stay within Matlab, and as chappjc suggests you can use the Parallel Computing Toolbox, but most of us don't have lots of toolboxes.

Is your data only in the MAT-file format, or is it available in some other format like CSV or HDF5? If you know Java or have access to someone who can program in it, I would suggest using Java threads, since Matlab runs on Java and has high-performance marshalling of data between Java and MATLAB. Then you don't have to worry about MEX files.

Jason S
  • 184,598
  • 164
  • 608
  • 970
  • thanks for the suggestions. I do have PCT in this case so chappjc solution works. Your comment raises another question though, does it mean marshaling between C++ and Matlab is not as efficient as with Java? If so do you know if any benchmark has been done to compare it? – user1715925 Oct 25 '13 at 18:32
  • 1
    @user1715925 - No, MEX files can access MATLAB data directly and manipulate variables in the MATLAB workspace -- the [MEX API](http://www.mathworks.com/help/matlab/matlab_external/c-c-source-mex-files.html) is very high performance. Also, a fine point is that the MATLAB engine is NOT based on Java, just the GUI. However, like Jason S. says, you have full access to all of Java's capabilities from the command line if the JVM is started (it is by default, but doesn't have to be). – chappjc Oct 25 '13 at 19:21