26

The official documentation states the following:

enter image description here. But I have noticed that there are other important differences besides those stated in the table above.

For example, saving a cell array with about 6,000 elements that occupies 176 MB of memory in MATLAB gives me the following results depending on whether I use -v7 or -v7.3:

  • With -v7: File size = 15 MB, and save & load is fast.
  • With -v7.3: File size = 400 MB, and save & load is very slow (probably in part because of the large file size).

Has anybody else noticed these differences?

Update 1: As the replies point out, -v7.3 relies on HDF5 and according to Mathworks, "this format has a significant storage overhead", although it's not clear if this overhead is really due to the format itself, or to the MATLAB implementation and handling of HDF5 instead.

Update 2: @Andrew Janke points us to this very helpful PDF (which apparently is not available in HTML format on the web). For more details, see the comments in the answer provided by @Amro.

This all takes me to the next question: Are there any alternatives that combine the best of both worlds (e.g. the efficiency of -v7 and the ability to deal with very large files of -v7.3)?

Amro
  • 123,847
  • 25
  • 243
  • 454
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
  • Those interested, check out this recent article: [Improving save performance](http://undocumentedmatlab.com/blog/improving-save-performance/) – Amro May 10 '13 at 05:30

1 Answers1

14

Version 7.3 of MAT-files uses HDF5 format, this format has a significant storage overhead to describe the contents of the file, especially so for complex nested cellarrays and structures. Its main advantage over previous versions of MAT-files is that it allows storing data larger than 2GB on 64-bit systems.

Note that both v7 and v7.3 are compressed and use Unicode encoding (unlike v6), yet they are two completely different formats...

References:

Amro
  • 123,847
  • 25
  • 243
  • 454
  • 2
    Thanks @Amro. I'm intrigued by the fact that "significant storage overhead" means that we need 400 MB instead of 15 MB for the exact same data, but I guess that explains everything. – Amelio Vazquez-Reina Feb 10 '11 at 15:58
  • 5
    @AmV: the thing with cell and structure arrays is that they can store heterogeneous data types, and each type needs to be "described". If you compare the two formats (v7/v7.3) using a regular MATLAB "double" matrix (ex: `M = rand(3000,3000); save v7.mat M -v7; save v73.mat M -v7.3`), you would get similar file sizes. On the other hand, replace the above matrix with a cell array (`M = num2cell(M);`) and you will see a big difference in size... – Amro Feb 10 '11 at 16:13
  • 7
    See also http://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf for a full description of the MAT file format. Since HDF5 is a general purpose format, some descriptive type info is done with strings in the headers (e.g. "MATLAB_class", "double"). In the MAT format, built-in Matlab types are described with binary magic cookies that fit in a couple bytes, so the MAT headers can be as small as 56 bytes. If you're on Linux or cygwin, "h5dump -p" and "od -c" will give you a view of the headers in the v7.3 files. – Andrew Janke Feb 11 '11 at 05:58
  • @Andrew, that's a very helpful piece of documentation. I always thought that the PDF documentation was replicated in the [official HTML documentation](http://www.mathworks.com/help/index.html). Maybe it is? I couldn't find an explanation of the MAT-File format with this level of detail in the HTML doc. – Amelio Vazquez-Reina Feb 11 '11 at 15:51
  • Thanks AmV. That PDF file is the only place I have seen this info. I don't think it's in the online documentation. MATLAB > User Guide > External interfaces > "Importing and Exporting MAT-files..." is the closest I've seen, and that only discusses the public C and Fortran APIs for it. – Andrew Janke Feb 11 '11 at 16:08
  • @AndrewJanke that PDF says "level 5". Does that correspond to matlab v6 files? – Johannes Schaub - litb Mar 29 '16 at 09:56
  • @JohannesSchaub-litb there's a bit of overlap with those version numbers but yes, MAT-files level 5 cover the `-v6` and `-v7` save/load flags, while level 4 corresponds to `-v4` flag. Note that HDF5-based MAT-files (`-v7.3` flag) are not discussed in that PDF document. So if you save a MAT-file using either `-v6` or `-v7` flags you actually get the following header at the beginning of the file `MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: ...`. Using `-v7.3` you get `MATLAB 7.3 MAT-file, Platform: PCWIN, Created on: ... HDF5 schema 1.00 .`, while `-v4` doesn't produce any textual header. – Amro Mar 29 '16 at 11:06
  • Note that the versions in the save/load flag correspond to MATLAB versions when they were introduced I guess, although it is now more common to name MATLAB by the release number (as in R2016a). You can see the table [here](https://en.wikipedia.org/wiki/MATLAB#Release_history) for a mapping between MATLAB versions and their release names. The current documentation of save makes it a bit clearer: http://www.mathworks.com/help/matlab/ref/save.html#inputarg_version – Amro Mar 29 '16 at 11:10