5

I write some mex function and have to return huge array of strings.

I do this as following:

  mxArray * array = mxCreateCellMatrix(ARRAY_LEN, 1);
  for (size_t k = 0; k < ARRAY_LEN; ++ k) {
      mxArray *str = mxCreateString("Hello");
      mxSetCell(array, k, str);
  }
  prhs[0] = array;

However, since the string has always same value, I would like to create only one instance of it. like

  mxArray * array = mxCreateCellMatrix(ARRAY_LEN, 1);
  mxArray *str = mxCreateString("Hello");

  for (size_t k = 0; k < ARRAY_LEN; ++ k) {
      mxSetCell(array, k, str);
  }
  prhs[0] = array;

Does it possible? How the garbage collector knows to release it? Thank you.

user1626803
  • 101
  • 7
  • 1
    @Shai: it will crash MATLAB as soon as you clear the variable returned from the MEX-function. Cell arrays are destructed in a deep recursive manner, one-by-one. However all pointers stored are basically the same array, so attempting to free the same memory multiple times will result in a memory corruption. – Amro Sep 17 '13 at 12:24

4 Answers4

6

The second code you suggested is not safe and should not be used, as it could crash MATLAB. Instead you should write:

mxArray *arr = mxCreateCellMatrix(len, 1);
mxArray *str = mxCreateString("Hello");
for(mwIndex i=0; i<len; i++) {
    mxSetCell(arr, i, mxDuplicateArray(str));
}
mxDestroyArray(str);
plhs[0] = arr;

This is unfortunately not the most efficient use of memory storage. Imagine that instead of using a tiny string, we were storing a very large matrix (duplicated along the cells).


Now it is possible to do what you initially wanted, but you'll have to be resort to undocumented hacks (like creating shared data copies or manually increment the reference count in the mxArray_tag structure).

In fact this is what usually happens behind the scenes in MATLAB. Take this for example:

>> c = cell(100,100);
>> c(:) = {rand(5000)};

As you know a cell array in MATLAB is basically an mxArray whose data-pointer points to an array of other mxArray variables.

In the case above, MATLAB first creates an mxArray corresponding to the 5000x5000 matrix. This will be stored in the first cell c{1}.

For the rest of the cells, MATLAB creates "lightweight" mxArrays, that basically share its data with the first cell element, i.e its data pointer points to the same block of memory holding the huge matrix.

So there is only one copy of the matrix at all times, unless of course you modify one of them (c{2,2}(1)=99), at which point MATLAB has to "unlink" the array and make a separate copy for this cell element.

You see internally each mxArray structure has a reference counter and a cross-link pointer to make this data sharing possible.

Hint: You can study this data sharing behavior with format debug option turned on, and comparing the pr pointer address of the various cells.

The same concept holds true for structure fields, so when we write:

x = rand(5000);
s = struct('a',x, 'b',x, 'c',x);

all the fields would point to the same copy of data in x..


EDIT:

I forgot to show the undocumented solution I mentioned :)

mex_test.cpp

#include "mex.h"

extern "C" mxArray* mxCreateReference(mxArray*);

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mwSize len = 10;
    mxArray *arr = mxCreateCellMatrix(len, 1);
    mxArray *str = mxCreateString("Hello");
    for(mwIndex i=0; i<len; i++) {
        // I simply replaced the call to mxDuplicateArray here
        mxSetCell(arr, i, mxCreateReference(str));
    }
    mxDestroyArray(str);
    plhs[0] = arr;
}

MATLAB

>> %c = repmat({'Hello'}, 10, 1);
>> c = mex_test()
>> c{1} = 'bye'
>> clear c

The mxCreateReference function will increment the internal reference counter of the str array each time it is called, thus letting MATLAB know that there are other copies of it.

So when you clear the resulting cell arrays, it will in turn decrement this counter one for each cell, until the counter reaches 0 at which point it is safe to destroy the array in question.

Using the array directly (mxSetCell(arr, i, str)) is problematic because the ref-counter immediately reaches zero after destroying the first cell. Thus for subsequent cells, MATLAB will attempt to free arrays that have already been freed, resulting in memory corruption.

Amro
  • 123,847
  • 25
  • 243
  • 454
  • see here for more about shared arrays in MEX-files: http://www.mk.tu-berlin.de/Members/Benjamin/mex_sharedArrays – Amro Sep 17 '13 at 11:58
  • Here is a fantastic post by *James Tursa* summarizing all things related to MEX memory management we wish were properly documented and officially supported: http://www.mathworks.com/matlabcentral/answers/79046-mex-api-wish-list – Amro Sep 17 '13 at 12:54
3

Bad news ... as of R2014a (possibly R2013b but I can't check) mxCreateReference is no longer available in the library (either missing or not exported), so the link will fail. Here is a replacement function you can use that hacks into the mxArray and bumps up the reference count manually:

struct mxArray_Tag_Partial {
    void *name_or_CrossLinkReverse;
    mxClassID ClassID;
    int VariableType;
    mxArray *CrossLink;
    size_t ndim;
    unsigned int RefCount; /* Number of sub-elements identical to this one */
};

mxArray *mxCreateReference(const mxArray *mx)
{
    struct mxArray_Tag_Partial *my = (struct mxArray_Tag_Partial *) mx;
    ++my->RefCount;
    return (mxArray *) mx;
}
James Tursa
  • 2,242
  • 8
  • 9
  • I just saw your post, and I wanted to clarify that many undocumented *C* functions were indeed removed from the MEX-API in R2014a (including `mxCreateReference` used in the example above). But thankfully new equivalent *C++* functions were added in their place. These are exposed under the `matrix::detail::noninlined::mx_array_api` namespace. See the explanation [here](http://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data/#MEX) on how to update the code... Anyway, good to see you here James and welcome to Stack Overflow :) – Amro May 11 '15 at 15:26
  • For instance using [Dependency Walker](http://www.dependencywalker.com/), I can see the following function exported in `libmx.dll` on R2015a: `struct mxArray_tag * matrix::detail::noninlined::mx_array_api::mxCreateReference(struct mxArray_tag const *)` – Amro May 11 '15 at 15:30
  • Unfortunately I think this is no longer working as expected. Following Bruno and James in this post it seems like has all become a mess in the latest versions (2018-2020) https://www.mathworks.com/matlabcentral/answers/396103-mxcreateshareddatacopy-no-longer-supported-in-r2018a – Jimbo Feb 13 '21 at 20:57
  • I added my own answer which takes into account a change in the reference counter location that took place in 2019b. – Jimbo Feb 14 '21 at 04:25
0

In 2019b the position of the reference counter moved. As a workaround I now detect the MATLAB version while running and change the offset into the header accordingly. One could also do a compile-time check but I wanted my mex file to work across versions without recompiling. Note, since I'm not explicitly accessing a structure anymore I no longer have a partial structure definition. I also expose a flag option ALLOW_REF_COUNT to the user when compiling to simply do a deep copy. Feedback/suggestions welcome ...

#include "stdlib.h"  /* atoi */
#include "string.h" /* strchr */
int ref_offset = -1;

mxArray* mxCreateReference(const mxArray *mx){
    #ifdef ALLOW_REF_COUNT
        if (ref_offset == -1){
            //Grabs output of version() e.g. 9.9.0.15 etc.
            //and translates into 909 - we add a 0 because we would want
            //9.12 to be 912 and newer/higher than 9.9
            mxArray *version;
            mexCallMATLAB(1,&version,0, NULL, "version");
            char *str = mxArrayToString(version);
            char* loc = strchr(str, '.');
            int mantissa = atoi(loc+1);
            int whole = atoi(str);
            int version_id = whole*100 + mantissa;

            mxDestroyArray(version);
            mxFree(str);
            
            //_Static_assert => c11
            _Static_assert(sizeof(void *) == 8, "Error: 32bit MATLAB not supported");
            
            //907 -> 2019b
            if (version_id < 907){
                ref_offset = 8;
            }else{
                ref_offset = 6;
            }
        }

        uint32_t *ref_count = ((uint32_t *) mx) + ref_offset; 
        (*ref_count)++;

        //struct mxArray_Tag_Partial *my = (struct mxArray_Tag_Partial *) mx;
        //++my->RefCount;
        return (mxArray *) mx;
    #else
        return mxDuplicateArray(mx);
    #endif
}
Jimbo
  • 2,886
  • 2
  • 29
  • 45
0

@Jimbo , some comments on your posted code:

Your code makes a silent assumption that it is running a 64-bit MATLAB version, and that mwSize is 64-bit. If this is used in a 32-bit MATLAB version and mwSize is 32-bit then the ref_count position you calculate will not be correct.

The code will not work properly without the necessary headers for the library functions you are using. I.e., in C without the prototypes the functions that return float will be assumed to return int and calculated results will end up wrong. Maybe include these lines at the top to make this explicit:

#include <stdlib.h>  /* strtof */
#include <math.h> /* roundf */

I don't see any logic where you "add 0" to the single digit fractions to make 9.9 appear less than 9.12 for example as you indicate. E.g., a 9.12 is just going to result in a minor_ver of 1, not 12 as you indicate. This should be fixed up.

mexCallMATLAB creates the return mxArray from scratch. You do not need to "pre-allocate" the result. In fact, what you are doing just creates a memory leak since the pointer to the mxCreateNumericMatrix(etc) call gets overwritten by the mexCallMATLAB call. The solution is simply to define the return variable and nothing else. E.g.,

mxArray *version;

You should free the temporary memory you have used to calculate the version number. Yes, these will be on the garbage collection list (prior to R2017a str would not be on the garbage collection list), but it is good practice to free the memory as soon as you are done with it. E.g., after you calculate ref_offset do this:

mxDestroyArray(version);
mxFree(str);

The ref_count field of the mxArray is a 32-bit integer. It is next to another 32-bit integer that is used for bit flags (isComplex, isNumeric, isSparse, etc.). However, you are pointing at ref_count as if it is a 64-bit integer mwSize, and then incrementing it based on this. While this might work if the actual 32-bit ref_count happens to line up with the low order 32-bits of a 64-bit mwSize, this is a bit iffy IMO because it seems to depend on word ordering of the 64-bit integer. You might want to modify this to make it more robust.

You might also be interested in MATLAB version code (both compile time and run time) that is posted here: https://www.mathworks.com/matlabcentral/fileexchange/67016-c-mex-matlab-version

James Tursa
  • 2,242
  • 8
  • 9
  • Thanks James! Not showing the include statements is a pet peeve of mine when trying to use other peoples' code -- my bad. Most comments made sense. Wasn't sure how to handle the 32/64 bit MATLAB (two pointers precede so subtract 2 when 32 bit?) - but I'm fine ignoring 32bit for now. Definitely had a bug in the minor version and got lucky with the ref count and lower bits toggling -- I was wondering why the count made no sense but still worked! – Jimbo Feb 15 '21 at 05:37