2

I'm struggling with this problem: I've 2 large 2D numpy arrays (about 5 GB) and I want to save them in a .mat file loadable from Matlab I tried scipy.io and wrote

from scipy.io import savemat

data = {'A': a, 'B': b}
savemat('myfile.mat', data, appendmat=True, format='5',
        long_field_names=False, do_compression=False, oned_as='row')

but I get the error: OverflowError: Python int too large to convert to C long

EDIT: Python 3.8, Matlab 2017b

Here the traceback

a.shape (600,1048261) of type <class 'numpy.float64'>

b.shape (1048261) of type <class 'numpy.float64'>

data = {'A': a, 'B': b}
savemat('myfile.mat', data, appendmat=True, format='5',
        long_field_names=False, do_compression=False, oned_as='row')
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-19-4d1d08a54148> in <module>
      1 data = {'A': a, 'B': b}
----> 2 savemat('myfile.mat', data, appendmat=True, format='5',
      3         long_field_names=False, do_compression=False, oned_as='row')

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio.py in savemat(file_name, mdict, appendmat, format, long_field_names, do_compression, oned_as)
    277         else:
    278             raise ValueError("Format should be '4' or '5'")
--> 279         MW.put_variables(mdict)
    280 
    281 

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in put_variables(self, mdict, write_header)
    847                 self.file_stream.write(out_str)
    848             else:  # not compressing
--> 849                 self._matrix_writer.write_top(var, asbytes(name), is_global)

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_top(self, arr, name, is_global)
    588         self._var_name = name
    589         # write the header and data
--> 590         self.write(arr)
    591 
    592     def write(self, arr):

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write(self, arr)
    627             self.write_char(narr, codec)
    628         else:
--> 629             self.write_numeric(narr)
    630         self.update_matrix_tag(mat_tag_pos)
    631 

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_numeric(self, arr)
    653             self.write_element(arr.imag)
    654         else:
--> 655             self.write_element(arr)
    656 
    657     def write_char(self, arr, codec='ascii'):

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_element(self, arr, mdtype)
    494             self.write_smalldata_element(arr, mdtype, byte_count)
    495         else:
--> 496             self.write_regular_element(arr, mdtype, byte_count)
    497 
    498     def write_smalldata_element(self, arr, mdtype, byte_count):

~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_regular_element(self, arr, mdtype, byte_count)
    508         tag = np.zeros((), NDT_TAG_FULL)
    509         tag['mdtype'] = mdtype
--> 510         tag['byte_count'] = byte_count
    511         self.write_bytes(tag)
    512         self.write_bytes(arr)

OverflowError: Python int too large to convert to C long

I tried also with hdf5storage

 hdf5storage.write(data, 'myfile.mat', matlab_compatible=True)

but it fails too.

EDIT:

gives this warning

\miniconda3\envs\work\lib\site-packages\hdf5storage\__init__.py:1306: 
 H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) 
 in h5py 3.0. To suppress this warning, pass the mode you need to 
 h5py.File(), or set the global default h5.get_config().default_file_mode, or 
 set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 
 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  f = h5py.File(filename)

Anyway, it creates a 5GB file but when I load it in Matlab I get a variable named with the file path and apparently without data.

Lastly I tried with h5py:

 import h5py

 hf = h5py.File('C:/Users/flavio/Desktop/STRA-pattern.mat', 'w')

 hf.create_dataset('A', data=a)
 hf.create_dataset('B', data=b)

 hf.close()

but the output file in not recognized/readable in Matlab.

Is splitting the only solution? Hope there is a better way to fix this issue.

cflayer
  • 104
  • 1
  • 9
  • You may find suitable answer in a similar question. [Link to similar question](https://stackoverflow.com/questions/35706697/saving-numpy-structure-array-to-mat-file) – Sanjiban Sengupta Aug 18 '20 at 22:10
  • 1
    Tell us about `a` and `b` - inparticular `shape` and `dtype`. If `object` dtype, tell us about the elements. You may also need to show the full traceback. An error like that comes from inside the `savemat` function. That **"but it fails too"** error description is just plain bad manners. If you want help, give us full information. – hpaulj Aug 18 '20 at 22:32
  • Did you try saving just one array, or even part? Without the tracwback it's hard say if the problem is with the size, or with values. – hpaulj Aug 18 '20 at 23:36
  • It should be a problem of size, cause if I try saving part of data it works fine – cflayer Aug 19 '20 at 12:13
  • Maybe the easiest thing for simple arrays is to save them to npy-format and use https://github.com/kwikteam/npy-matlab/tree/master/npy-matlab for reading the files in matlab. – max9111 Aug 20 '20 at 08:34

1 Answers1

1

Anyone still looking for an answer, this works with hdf5storage

hdf5storage.savemat( save_path, data_dict, format=7.3, matlab_compatible=True, compress=False )