I am trying to create .mat data files using python. The matlab code expects the data to have a certain format, where two-dimensional ndarrays of non-uniform sizes are stored as objects in a column vector. So, in my case, there would be k numpy arrays of shape (m_i, n) - with different m_i for each array - stored in a numpy array with dtype=object
of shape (k, 1). I then add this object array to a dictionary and pass it to scipy.io.savemat()
.
This works fine so long as the m_i are indeed different. If all k arrays happen to have the same number of rows m_i, the behaviour becomes strange. First of all, it requires very explicit assignment to a numpy array of dtype=object that has been initialised to the final size k, otherwise numpy simply creates a three-dimensional array. But even when I have the correct format in python and store it to a .mat file using savemat
, there is some kind of problem in the translation to the matlab format.
When I reload the data from the .mat file using scipy.io.loadmat
, I find that I still have an object array of shape (k, 1), which still has elements of shape (m, n). However, each element is no longer an int or a float but is instead a numpy array of shape (1, 1) that has to be further indexed to access the contained int or float. So an individual element of an object vector that was supposed to be a numpy array of shape (2, 4) would look something like this:
[array([[array([[0.82374894]]), array([[0.50730055]]),
array([[0.36721625]]), array([[0.45036349]])],
[array([[0.26119276]]), array([[0.16843872]]),
array([[0.28649524]]), array([[0.64239569]])]], dtype=object)]
This also poses a problem for the matlab code that I am trying to build my data files for. It runs fine for the arrays of objects that have different shapes but will break when there are arrays containing arrays of the same shape.
I know this is a rather obscure and possibly unavoidable issue but I figured I would see if anyone else has encountered it and found a fix. Thanks.