4

I have a written C++ server app that I would like to be able to control from Matlab. I have used a mex function for socket communication so far, but I would like to ditch the mex function and use inline Java directly in the m files. This will be a more streamlined solution.

My C++ based standalone app expects a message with the following data in the following order . . .

This part of the protocol is fixed and cannot be changed:

  • uint32 magic_number - this is a magic number (445566) that must be at the start of the message or the rest of the message will be ignored.

  • uint32 num_bytes - this is the number of bytes used for the rest of the message block (excluding this initial 8 bytes)

This part of the protocol was designed by me and can be changed:

  • Next comes a header made of 4 uint8 values (like an ipv4 address) signalling to the app what the following data represents (if any data follows)

  • After this, the remaining bytes can represent many different things. Most commonly this would be a string (key value) followed by a long array of floating point values (audio data). However, there may just be a string, or they may just be an array of floating point values. The 4 uint8 values let the server know what to expect here.

As you can see, I am currently squeezing everything into an array of uint8 (a colossal kludge). This is because the java "write" function expects a byte array and a Matlab uint8 array is a compatible data type as I found when using the following table on the Mathworks site Passing Data to a Java Method

I'm not a Java programmer, but I have managed to get a very simple bit of communication code up and running this afternoon. Can anyone help me make this better?

import java.net.Socket
import java.io.*

mySocket = Socket('localhost', 12345);
output_stream   = mySocket.getOutputStream;
d_output_stream = DataOutputStream(output_stream);


data = zeros(12,1,'uint8');

%Magic key: use this combination of uint8s to make
% a uint32 value of = 445566 -> massive code-smell
data(1) = 126;
data(2) = 204;
data(3) = 6;

%Size of message block:
%total number of bytes in following message including header
%This is another uint32 i.e. (data(5:8))

data(5) = 4;

%header B: a group of 4 uint8s
data(9) = 1;
data(10) = 2;
data(11) = 3;
data(12) = 4;

%Main block of floats
%????


d_output_stream.write(data,0,numel(data));


pause(0.2);
mySocket.close;

I have experimented with sending a java object composed of the different parts of the data that I would like to send, but I am not sure how they end up ordered in memory. In C/C++ it is very easy to append different data types in a contiguous block of memory and then send it. Is there a simple way for me to do this here in Java? I would eventually like to make the communications 2-way also, but this can wait for now. Thanks for reading.

learnvst
  • 15,455
  • 16
  • 74
  • 121
  • 1
    When I'm sending data between Matlab and Java I send java objects with the necessary properties. Perhaps you could try that. – Stefan Gretar Jan 25 '12 at 22:38
  • 1
    Can you narrow down "many different things"? Will its components always be an array of primitive numbers or something else directly representable as a Matlab primitive array? And do you need to get data back from the server? – Andrew Janke Jan 25 '12 at 23:00
  • Are you sure the "size of message block" is a single byte? That'll limit you to 256 byte messages. Is this a protocol you designed yourself? – Andrew Janke Jan 25 '12 at 23:04
  • I have updated the question to cover some of the points raised in these questions. Thanks for taking interest. – learnvst Jan 25 '12 at 23:34
  • 1
    In the case of the string (key value) followed by floats, how does the server know how long the string is, and how long the floats are? Does that 4-byte header encode lengths, or are you using fixed lengths? You might need more subheaders. Have a look at the MAT file format doco for an example of how Matlab itself does this: http://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf – Andrew Janke Jan 26 '12 at 00:02
  • Gotcha - I had misread the example code since it filled in only 1 byte, leaving 3 as zeros by default. – Andrew Janke Jan 26 '12 at 00:06
  • I intended the 4 byte header to encode everything, but I have not sat down and checked that it would be able to do so just yet! You have (very observantly) opened up a whole different question from my original post. The 4 byte header has been sufficient for my simple test cases so far, but it has been a niggle in the back of my mind that it might be insufficient/inelegant as the project develops. Thanks for the link. – learnvst Jan 26 '12 at 00:11

1 Answers1

1

There's at least two separate issues here. One is how to structure Matlab code that speaks a protocol like this. The other his how to represent possibly complex data in this wire protocol you have.

As far as organizing the Matlab code, you could use a class to organize the message in a more structured manner, and use typecast to convert the numbers down to bytes. Maybe something like this. This assumes your client and server have the same native representation of primitive types, and ignores network byte ordering (htonl/ntohl).

classdef learnvst_message
    %//LEARNVST_MESSAGE Message for learnvst's example problem
    %
    % Examples:
    % msg = learnvst_message;
    % msg.payload = { 'Hello world', 1:100 }
    % msg.payloadType = uint8([ 5 12 0 0 ]);  % guessing on this

    properties
        magicNumber = uint32(445566);
        payloadType = zeros(4, 1, 'uint8');  %// header B
        payload = {};
    end

    methods
        function out = convertPayload(obj)
        %//CONVERTPAYLOAD Converts payload to a single array of bytes
        byteChunks = cellfun(@convertPayloadElement, obj.payload, 'UniformOutput',false);
        out = cat(2, byteChunks{:});
        end

        function out = marshall(obj)
        payloadBytes = convertPayload(obj);
        messageSize = uint32(4 + numel(payloadBytes)); %// ex first 8 bytes
        out.headerBytes = [
            typecast(obj.magicNumber, 'uint8') ...
            obj.payloadType ...
            typecast(messageSize, 'uint8')];
        out.payloadBytes = payloadBytes;
        end

        function sendTo(obj, host, port)
        m = marshall(obj);
        mySocket = Socket(host, port);
        d_output = mySocket.getOutputStream();
        d_output.write(m.headerBytes, 0, numel(m.headerBytes));
        d_output.write(m.messageBytes, 0, numel(m.messageBytes));
        mySocket.close();
        end

    end
end

function out = convertPayloadElement(x)
if isnumeric(x)
    out = typecast(x, 'uint8');
elseif ischar(x)
    % Assumes receiver likes 16-bit Unicode chars
    out = typecast(uint16(x), 'uint8');
else
    % ... fill in other types here ...
    % or define a payload_element class that marshalls itself and call
    % it polymorphically
    error('Unsupported payload element type: %s', class(x));
end
end

More readable, I think, and a bit less code smell. As a caller, you can work with the data in a more structured form, and it encapsulates the conversion to the wire-protocol bytes inside the class's marshalling method. That "convertPayload" is what "stitches together a generic block of memory together made of many different data types". In Matlab, a uint8 array is a way to append representations of different data types together in a continguous block of memory. It's basically a wrapper around an unsigned char [], with automatic reallocation. And typecast(...,'uint8') is sort of the equivalent of doing a reinterpret cast to char * in C/C++. See the help for both of them.

But this brings up more questions. How does the server know how long each of the components of the payload are, what their shape is if multidimensional, and what their respective types are? Or what if they're complex data types - could they nest? You might need to embed little headers inside each of the payload elements. The code above assumes the 4-byte payload type header fully describes the payload contents.

Sounds like what you're looking for may be a sort of self-describing format for heterogeneous array based data. There are existing formats for that, including NetCDF, HDF5, and Matlab's own MAT files. Matlab has built-in support for them, or you could pull in third-party Java libraries for them.

As far as speed - You're going to have to pay each time you pass data across the Matlab/Java boundary. Large primitive arrays are relatively cheap to convert, so you probably want to pack most of the message up in a byte array in Matlab before passing it to Java, instead of making lots of separate write() calls. It'll depend in practice on how big and complex your data is. See Is MATLAB OOP slow or am I doing something wrong? for a rough idea of the cost of some Matlab operations, including Java calls. (Full disclosure: that's a self-plug.)

Community
  • 1
  • 1
Andrew Janke
  • 23,508
  • 5
  • 56
  • 85
  • Wow! Thanks for such an insightful response and thanks for your time. It is very late at night here at the moment, so I'll experiment with it in the morning, benchmark it against my current mex solution, and post the results. I'll be sure to read into the existing formats that you mentioned. Thanks again! – learnvst Jan 26 '12 at 00:05
  • 1
    Glad to help; I don't get to use this low-level Matlab stuff very often. (Which is fine - one `typecast()` per million lines of code sounds about right.) Don't forget to use `profile` to see where the time is being spent. Good luck. – Andrew Janke Jan 26 '12 at 00:37
  • Erm, "low-level Matlab" sounds like an oxymoron, yes. I said I was going to sleep but I lied and had to give it a shot . . coming up with an error when using your example code The first input argument must be a full, non-complex numeric value. Error in ==> Janke>@(x)typecast(x,'uint8') at 20 byteChunks = cellfun(@(x) typecast(x, 'uint8'), obj.payload); ... definitely off to sleep now. Thanks again. – learnvst Jan 26 '12 at 00:48
  • You're right. Riddled with other bugs too. I refactored the code so I could test it (without getting to the point of setting up a socket) and fixed some of them. There's the first edge case - chars can't typecast directly. – Andrew Janke Jan 26 '12 at 01:09
  • OK, after ironing out a few insignificant bugs, I managed to get this working well. I tested by sending 100,000 samples of single precision audio data to my app + the header info. The mex solution took an average of 9ms and the Matlab solution also averaged out to about 9ms. This result was unexpected, and I am very happy! I will make the code a little more pretty and then post my final solution later on. Thanks again! – learnvst Jan 26 '12 at 09:32
  • For shorter data blocks (100 samples) they both take ~6ms. All good. – learnvst Jan 26 '12 at 09:38