3

This is relating to my last question, which can be found here. I'm dealing with lists similar to the list I describe in that link as markerList - so a list with three levels. I need to save this information as a .mat file, but I can't get it to save in the right type. When using scipy.io.savemat, it saves the list as a 200x40x2 single, when it should be a set of 200 cells, each containing a 40x2 cell.

The code I'm using to save this is:

matdict = dict(markers = (markerList), sorted = (finalStack))
scipy.io.savemat('C:\pathname\\sortedMarkers.mat', matdict)

What is confusing to me is that it saves markerList in the correct format (1x200 cell, each a cell of varying size), but not finalStack (saved as a 200 x 40 x 2 single). On top of that, before I had figured out the rest of this code, it would save finalStack correctly - which makes me think that perhaps it saves as a cell when the data it is saving isn't uniform in size. (finalStack is uniform in size; markerList is not.)

Is there a way to save a complicated data structure like this as a .mat file?

Community
  • 1
  • 1
godfreap
  • 333
  • 1
  • 5
  • 13
  • I have to ask, just to confirm this isn't a case of [XY Problem](http://xyproblem.info) will you be using this in Matlab? Or are you just using `scipy.io.savemat` as a generic library of saving a python workspace? Because if it's the latter, there are better alternatives. – Tasos Papastylianou Aug 15 '16 at 18:54
  • But saving a (200,40,2) array where you expected a (20,) array of (40,2) arrays isn't a problem with `savemat`. If my diagnose is right, it's an issue with how numpy turns a list of arrays into an array. – hpaulj Aug 15 '16 at 19:08
  • Tasos- this is not an XY problem. Basically everyone I work with uses Matlab, and an array of cells is the most similar to nested lists that I am aware of. Also, I already have script written to batch-process my data once it's saved in the appropriate format (not something that couldn't be redone, but motivation nonetheless). Any other formats seem very bulky and unwieldy when it comes to saving a large amount of nested data like this. – godfreap Aug 16 '16 at 11:15

2 Answers2

6

As per savemat documentation, convert into a numpy array of 'objects':

from scipy.io import savemat
import numpy

a = numpy.array([[1,2,3],[1,2,3]])
b = numpy.array([[2,3,4],[2,3,4]])
c = numpy.array([[3,4,5],[3,4,5]])
L = [a,b,c]

FrameStack = numpy.empty((len(L),), dtype=numpy.object)
for i in range(len(L)):
    FrameStack[i] = L[i]

savemat("myfile.mat", {"FrameStack":FrameStack})

In octave:

>> load myfile.mat 

>> whos FrameStack
Variables in the current scope:

   Attr Name            Size                     Bytes  Class
   ==== ====            ====                     =====  ===== 
        FrameStack      1x3                        144  cell

Total is 3 elements using 144 bytes

>> whos FrameStack{1}
Variables in the current scope:

   Attr Name               Size                     Bytes  Class
   ==== ====               ====                     =====  ===== 
        FrameStack{1}      2x3                         48  int64

Total is 6 elements using 48 bytes
Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57
  • But what happens when `A,B,C` are all the same shape? – hpaulj Aug 15 '16 at 20:32
  • I see what you're getting at here, but the contents of my list are related and dependent. That is, I can't just break finalStack[0][1][0] into its separate elements; it is meaningless at that point. Anyways, I see what you're saying, and I might be able to figure it out from here... – godfreap Aug 16 '16 at 11:29
  • I realize I need to make a Python object where finalStack[0][0] is {1}(1) in Matlab, but I can't figure that out, hence the question. Appending finalStack to a list (`b.append(finalStack[i])` over the range, where `b = [ ]`), however, does not resolve the problem - they are still saved as a single. (`finalStack[i]` is a 40x2 np.array for all elements i.) Still working on it, just no luck yet. – godfreap Aug 16 '16 at 12:42
  • I agree. I think right now the easiest way to get this done faster rather than later is to screw around with the mat file as it's put out, rather than trying to mess with the Python interpretation. Was trying to minimize work on the Matlab end, but such is life. Thanks for the help! – godfreap Aug 16 '16 at 13:18
  • @godfreap my apologies, I just rerun your example and I see what you mean about same shapes resulting in concatenated arrays. The solution is to use numpy.object arrays instead; this is stated in the savemat documentation. I'll update my answer to show you an example. – Tasos Papastylianou Aug 16 '16 at 13:58
  • @godfreap the edited answer should solve your problem, I apologise for the confusion; you're absolutely right that savemat treats lists differently for equally-shaped arrays, whereas my earlier solution used arrays of different shapes (which you said could be the case in the question), hence I didn't spot the problem at first. I deleted my previous comments since they were misleading in the context of this update; feel free to do the same. :) – Tasos Papastylianou Aug 16 '16 at 14:10
  • yeah, that worked. After I found a valid workaround too. haha, thanks! – godfreap Aug 16 '16 at 14:23
1

Without looking again at your previous question I suspect the issue is with now numpy.array creates arrays from lists of sublists or arrays.

You note that markerList is saved as expected, and that the cells vary in size.

Try

np.array(markerList)

and look at its shape and dtype. I'm guessing it will be 1d (200,), and object dtype.

np.array(finalStack)

on the other hand probably will be the 3d array it saves.

savemat is set up to save numpy arrays, not python dictionaries and lists - it is, after, all talking to MATLAB where everything used to be a 2d matrix. MATLAB cells generalize this; they are more like 2d numpy arrays of dtype object.

The issue of creating an object array from elements that uniform in size comes up often. The usual solution is to create empty array of the desired size (e.g. (200,)) and object type, and load the subarrays into that.

https://stackoverflow.com/a/38776674/901925

=============

I'll demonstrate. Make 3 arrays, 2 of one size, and different third:

In [59]: from scipy import io

In [60]: A=np.ones((40,2))    
In [61]: B=np.ones((40,2))
In [62]: C=np.ones((30,2))

Save two lists, one with just two arrays, the other with all three:

In [63]: io.savemat('test.mat', {'AB':[A,B],'ABC':[A,B,C]})

Load it back; I could do this in octave instead:

In [65]: D=io.loadmat('test.mat')

In [66]: D.keys()
Out[66]: dict_keys(['ABC', '__header__', 'AB', '__globals__', '__version__'])

ABC is a 2d array with 3 elements

In [68]: D['ABC'].shape
Out[68]: (1, 3)
In [71]: D['ABC'][0,0].shape
Out[71]: (40, 2)

but AB has been transformed into a 3d array:

In [69]: D['AB'].shape
Out[69]: (2, 40, 2)
In [70]: np.array([A,B]).shape
Out[70]: (2, 40, 2)

If I instead make a 1d object array to hold A and B, it is preserved:

In [72]: AB=np.empty((2,),object)
In [73]: AB[...]=[A,B]
In [74]: AB.shape
Out[74]: (2,)

In [75]: io.savemat('test.mat', {'AB':AB,'ABC':[A,B,C]})
In [76]: D=io.loadmat('test.mat')

In [77]: D['AB'].shape
Out[77]: (1, 2)
In [78]: D['AB'][0,0].shape
Out[78]: (40, 2)

A good alternative is to save the arrays as items of a dictionary

io.savemat('test.mat',{'A':A, 'B':B, 'C':C})

Given the difficulties in translating MATLAB structures to numpy ones and back, it's better to keep things flat and simple, rather than create compound objects that would be useful on both sides.

===============

I installed Octave. Loading this test.mat:

io.savemat('test.mat', {'AB':AB,'ABs':[A,B]})

gives

>> whos
Variables in the current scope:

   Attr Name        Size                     Bytes  Class
   ==== ====        ====                     =====  =====
        AB          1x2                       1280  cell
        ABs         2x40x2                    1280  double

An object dtype array is saved as a matlab cell; other arrays as matlab matrices. (I'd have to review earlier answers to recall the equivalent of matlab structures).

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353