Casting complex to real without data copy in MATLAB R2018a and newer

Question

Since MATLAB R2018a, complex-valued matrices are stored internally as a single data block, with the real and imaginary component of each matrix element stored next to each other -- they call this "interleaved complex". (Previously such matrices had two data blocks, one for all real components, one for all imaginary components -- "separate complex".)

I figure, since the storage now allows for it, that it should be possible to cast a complex-valued array to a real-valued array with twice as many elements, without copying the data.

MATLAB has a function typecast, which casts an array to a different type without copying data. It can be used, for example, to cast an array with 16 8-bit values to an array with 2 double floats. It does this without copying the data, the bit pattern is re-interpreted as the new type.

Sadly, this function does not work at all on complex-valued arrays.

I'm looking to replicate this code:

A = fftn(randn(40,60,20)); % some random complex-valued array
assert(~isreal(A))

sz = size(A);
B = reshape(A,1,[]);        % make into a vector
B = cat(1,real(B),imag(B)); % interleave real and imaginary values
B = reshape(B,[2,sz]);      % reshape back to original shape, with a new first dimension
assert(isreal(B))

The matrices A and B have (in R2018a and newer) the exact same data, in exactly the same order. However, to get to B we had to copy the data twice.

I tried creating a MEX-file that does this, but I don't see how to create a new array that references the data in the input array. This MEX-file works, but causes MATLAB to crash when clearing variables, because there are two arrays that reference the same data without them realizing that they share data (i.e. the reference count is not incremented).

// Build with:
//    mex -R2018a typecast_complextoreal.cpp

#include <mex.h>

#if MX_HAS_INTERLEAVED_COMPLEX==0
#error "This MEX-file must be compiled with the -R2018a flag"
#endif

#include <vector>

void mexFunction(int /*nlhs*/, mxArray* plhs[], int nrhs, const mxArray* prhs[]) {

   // Validate input
   if(nrhs != 1) {
      mexErrMsgTxt("One input argument expected");
   }
   if(!mxIsDouble(prhs[0]) && !mxIsSingle(prhs[0])) {
    mexErrMsgTxt("Only floating-point arrays are supported");
   }

   // Get input array sizes
   mwSize nDims = mxGetNumberOfDimensions(prhs[0]);
   mwSize const* inSizes = mxGetDimensions(prhs[0]);

   // Create a 0x0 output matrix of the same type, but real-valued
   std::vector<mwSize> outSizes(nDims + 1, 0);
   plhs[0] = mxCreateNumericMatrix(0, 0, mxGetClassID(prhs[0]), mxREAL);

   // Set the output array data pointer to the input array's
   // NOTE! This is illegal, and causes MATLAB to crash when freeing both
   // input and output arrays, because it tries to free the same data
   // twice
   mxSetData(plhs[0], mxGetData(prhs[0]));

   // Set the output array sizes
   outSizes[0] = mxIsComplex(prhs[0]) ? 2 : 1;
   for(size_t ii = 0; ii < nDims; ++ii) {
      outSizes[ii + 1] = inSizes[ii];
   }
   mxSetDimensions(plhs[0], outSizes.data(), outSizes.size());
}

I'd love to hear of any ideas on how to proceed from here. I don't necessarily need to fix the MEX-file, if the solution is purely MATLAB code, so much the better.

I would suggest not doing it. The underlying representation of Matlab data structures in main memory might change in the future, or at least, it is never promised that it won't change. — John Z. Li, Oct 08 '18 at 07:56
@JohnZ.Li: Yes, I agree with you. There are some MEX-files written using the reverse-engineered layout of the `mxArray` header. That is risky, and I wouldn't want to use such information to try to tweak the reference count or something. But I don't have a problem using undocumented functions that properly update reference counts, such as `mxCreateSharedDataCopy`. — Cris Luengo, Oct 08 '18 at 19:17

Dev-iL · Answer 1 · 2018-10-07T20:48:14.790

This question reminded me of some blog posts regarding in-place memory editing through MEX, about which the following was said:

[M]odifying the original data directly is both discouraged and not officially supported. Doing it incorrectly can easily crash Matlab. ^ref

This is, in the best case, an unmaintainable mess. ^ref

Having said that, I don't have a solution for you, but I might be able to suggest a workaround.

Seeing how MATLAB allows us to call python libraries, we can perform these sort of manipulations within python, and only bring the data back into MATLAB when we reach a stage where proceeding in python is impossible or undesired. Depending on when this stage might be, this idea could be either a valid approach, or a completely useless suggestion.

Take a look at the example below, it should be pretty self-explanatory:

np = py.importlib.import_module('numpy');
sp = py.importlib.import_module('scipy.fftpack');

% Create a double array in python:
arrC = sp.fftn(np.random.rand(uint8(4), uint8(3), uint8(2)));
%{
arrC = 
  Python ndarray with properties:

           T: [1×1 py.numpy.ndarray]
        base: [1×1 py.NoneType]
      ctypes: [1×1 py.numpy.core._internal._ctypes]
        data: [1×4 py.memoryview]
       dtype: [1×1 py.numpy.dtype]
       flags: [1×1 py.numpy.flagsobj]
        flat: [1×1 py.numpy.flatiter]
        imag: [1×1 py.numpy.ndarray]
    itemsize: [1×1 py.int]
      nbytes: [1×1 py.int]
        ndim: [1×1 py.int]
        real: [1×1 py.numpy.ndarray]
       shape: [1×3 py.tuple]
        size: [1×1 py.int]
     strides: [1×3 py.tuple]
    [[[ 13.99586491+0.j           0.70305071+0.j        ]
      [ -1.33719563-1.3820106j   -0.74083670+0.25893033j]
      [ -1.33719563+1.3820106j   -0.74083670-0.25893033j]]

     [[ -0.43914391+0.8336674j    0.08835445-0.50821244j]
      [  1.07089829-0.35245746j   0.44890850-0.9650458j ]
      [  2.09813180+1.34942678j  -1.20877832+0.71191772j]]

     [[ -2.93525342+0.j          -0.69644042+0.j        ]
      [  0.16165913-1.29739125j  -0.84443177+0.26884365j]
      [  0.16165913+1.29739125j  -0.84443177-0.26884365j]]

     [[ -0.43914391-0.8336674j    0.08835445+0.50821244j]
      [  2.09813180-1.34942678j  -1.20877832-0.71191772j]
      [  1.07089829+0.35245746j   0.44890850+0.9650458j ]]]
%}

% Make sure that python sees it as a "complex double" (aka complex128)
assert( isequal(arrC.dtype, np.dtype(np.complex128)) );

% Return a (real) double view:
arrR = arrC.view(np.float64);
%{
arrR = 
  Python ndarray with properties:

           T: [1×1 py.numpy.ndarray]
        base: [1×1 py.numpy.ndarray]
      ctypes: [1×1 py.numpy.core._internal._ctypes]
        data: [1×4 py.memoryview]
       dtype: [1×1 py.numpy.dtype]
       flags: [1×1 py.numpy.flagsobj]
        flat: [1×1 py.numpy.flatiter]
        imag: [1×1 py.numpy.ndarray]
    itemsize: [1×1 py.int]
      nbytes: [1×1 py.int]
        ndim: [1×1 py.int]
        real: [1×1 py.numpy.ndarray]
       shape: [1×3 py.tuple]
        size: [1×1 py.int]
     strides: [1×3 py.tuple]
    [[[ 13.99586491   0.           0.70305071   0.        ]
      [ -1.33719563  -1.3820106   -0.7408367    0.25893033]
      [ -1.33719563   1.3820106   -0.7408367   -0.25893033]]

     [[ -0.43914391   0.8336674    0.08835445  -0.50821244]
      [  1.07089829  -0.35245746   0.4489085   -0.9650458 ]
      [  2.0981318    1.34942678  -1.20877832   0.71191772]]

     [[ -2.93525342   0.          -0.69644042   0.        ]
      [  0.16165913  -1.29739125  -0.84443177   0.26884365]
      [  0.16165913   1.29739125  -0.84443177  -0.26884365]]

     [[ -0.43914391  -0.8336674    0.08835445   0.50821244]
      [  2.0981318   -1.34942678  -1.20877832  -0.71191772]
      [  1.07089829   0.35245746   0.4489085    0.9650458 ]]]
%}

% Do something else with it in python
...

% Bring data to MATLAB:
sz = cellfun(@int64, cell(arrR.shape));
B = permute(reshape(double(py.array.array('d', arrR.flatten('C').tolist())),sz),[2,1,3]);

The last stage might be inefficient enough to negate the performance/memory gains by not copying the data previously, so it would probably be a good idea to keep it for the very end, and reduce the data as much as possible beforehand.

While experimenting with this, I came to realize that there's much difficulty¹ involved in transferring non-scalar complex data from MATLAB to Python and back (just compare the MATLAB output for np.asarray(1+1i) vs np.asarray([1+1i, 1+1i])). Possibly, the reason for this is the limited support for complex non-ndarray arrays in python.

If you (as opposed to me) know what to do with memoryview objects (i.e. the content of the data field of the ndarray objects) - you could get a pointer and perhaps pass this on to C to get some useful results.

¹ It's possible, but you have to num2cell your data so that it can get transferred as a python list (and vice versa). The situation can also be improved by editing some MATLAB files.

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

Here is a wording from the documentation:

Since many mathematical libraries use an interleaved complex representation, using the same representation in your MEX functions eliminates the need to translate data. This simplifies your code and potentially speeds up the processing when large data sets are involved.

According to the example in the documentation each complex type appears to be an struct with two real and imaginary components:

typedef struct {
    double real;
    double imag;
} mxComplexDouble ;

So we want to (re-interpret) cast an array of mxComplexDouble to other double complex type that is used in a mathematical library.

For example a mathematical library may have defined its complex type as:

typedef struct {
    double real;
    double imag;
} other_complex ;

And it has defined a function to consume an array of its own complex type.

void do_something(const other_complex* math_array);

Here we want to send a MATLAB complex array to the math library:

mxComplexDouble matlab_array[5];
other_complex* math_array =  (other_complex*)matlab_array;
do_something(math_array);

Because it needs just a pointer conversion, It may be considered that it speeds up the process.

But regarding the strict aliasing rule * any use of math_array results in an undefined behavior . Even casting to array of doubles is prohibited:

double* double_array = (double*)matlab_array; 
printf("%f",double_array[0]); // Unedfined behavior

Therefor we need to allocate a new array and copy the data byte by byte using memcpy. It is the safe way that prevents undefined behavior.

The only exception to the strict aliasing rule is the standard complex data types that are defined in the headers complex.h and complex in c and c++ respectively and only floating point [float, double and long double] complex types can be defined. So we can safely cast a std::complex<double>* to double*.

Conclutions :

typecast may have been prohibited for new MATLAB complex data types because of the strict aliasing rule but as explained new complex data types cannot be used so cheaply in the other libraries. Only using memcpy may be considered as an efficient way for copying the whole data.
Using typcast for other data types seems that to be doubtful. I can (and probably should) assume that MATLAB has used some compiler tricks to prevent undefined behavior, otherwise when we are accessing a data element if MATLAB hasn't copied it byte by byte then it results in an undefined behavior.(Note that anyway casting between compatible types like int32 and uint32 and also casting any type to c++ char type have well defined behavior). Moreover it requires that all mex file to be compiled with the proper option to disable strict aliasing. But we currently have plenty of compiled mex files that if we send to it a result of typecast it leads to an undefined behavior. So use of typecast should be restricted as much as possible.

*I have found some blogs that have better explained the concept than the SO post. For example here or here or here.

Thank you for the warning. I agree that, where possible, it is best to follow the standard and write portable, future-proof code. — Cris Luengo, Oct 11 '18 at 13:12

James Tursa · Accepted Answer · 2020-07-08T23:34:01.983

2

See this FEX submission, which can do the complex --> 2 reals reinterpretation without copying the data (it can even point to interior contiguous sub-sections of the data without a copy):

https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-a-contiguous-subsection-of-an-existing-variable

If you are simply reading and writing interleaved complex data files in R2018a and later, see this FEX submission:

https://www.mathworks.com/matlabcentral/fileexchange/77530-freadcomplex-and-fwritecomplex

edited Jul 08 '20 at 23:34

answered Oct 12 '18 at 21:08

James Tursa

2,242
8
9

Note that the above FEX submission (a C mex routine) does use mxArray header hacks to accomplish the data sharing, and also uses the undocumented mxCreateSharedDataCopy API function. So either of these could cause the code to break in future versions of MATLAB (the header definition changes again or mxCreateSharedDataCopy is removed from the API library). The current code can be compiled in either the R2017b or R2018a memory model and had been tested in various 32-bit and 64-bit Windows versions R2011a - R2018a. – James Tursa Oct 12 '18 at 21:26
This looks very promising, thanks! I will try it out tonight. I thought that [`mxCreateSharedDataCopy` is no longer supported in R2018a?](https://www.mathworks.com/matlabcentral/answers/396103-mxcreateshareddatacopy-no-longer-supported-in-r2018a). – Cris Luengo Oct 12 '18 at 21:38
mxCreateSharedDataCopy is still in the API library, but MATLAB intentionally blocks you from linking with it by using a macro with the same name. If you #undef the macro you can still link with it. Same thing is true of other undocumented API functions as well. Read the notes in the matlab_version.h file for more details. I don't know why they keep making it harder for us to manage our own memory ... – James Tursa Oct 12 '18 at 22:03
They do that because they want us to switch to the C++ API. – Cris Luengo Oct 13 '18 at 03:33
There has been an amazing amount of work that went into this submission. I'm very impressed. Many thanks for creating it, maintaining it, and pointing it out here. – Cris Luengo Oct 13 '18 at 04:11
1

Note: If you are always using all of the data, then the SHAREDCHILD code above is overkill. The only thing you really need to do in this case is: (1) Create a shared data copy, (2) Hack the mxArray header to set/reset the complex bit flag, and (3) Change the dimensions (multiply one of them by 2 or 1/2 depending on which way you are going). Maybe I will create this simplified version and submit it to the FEX when I get some time. – James Tursa Oct 18 '18 at 21:02
Yes, it is overkill, I was thinking of removing most of its functionality, as I don't need to keep references of the arrays in my case. I guess that would leave just the bits that you describe here. – Cris Luengo Oct 18 '18 at 21:31
The simplified files have been uploaded. They can be found in the MATLAB FEX submission link above. – James Tursa Oct 19 '18 at 17:45

Casting complex to real without data copy in MATLAB R2018a and newer

3 Answers3

Conclutions :

Linked