3

I'm currently working on a project involving saving/loading quite big MAT files (around 150 MB), and I realized that it was much slower to access a loaded cell array than the equivalent version created inside a script or a function.

I created this example to simulate my code and show the difference :

clear; clc;

disp('Test for computing with loading');

if exist('data.mat', 'file')
    delete('data.mat');
end

n_tests = 10000;
data = {};
for i=1:n_tests
    data{end+1} = rand(1, 4096);
end

% disp('Saving data');
% save('data.mat', 'data');
% clear('data');
% 
% disp('Loading data');
% load('data.mat', '-mat');

for i=1:n_tests
    tic;
    for j=1:n_tests
        d = sum((data{i} - data{j}) .^ 2);
    end
    time = toc;
    disp(['#' num2str(i) ' computed in ' num2str(time) ' s']);
end

In this code, no MAT file is saved nor loaded. The average time for one iteration over i is 0.75s. When I uncomment the lines to save/load the file, the computation for one iteration over i takes about 6.2s (the saving/loading time is not taking into consideration). The difference is 8x slower !

I'm using MATLAB 7.12.0 (R2011a) 64 bits with Windows 7 64 bits, and the MAT files are saved with the version v7.3.

Can it be related to the compression of the MAT file? Or caching variables ? Is there any way to prevent/avoid this ?

Zaheer Ahmed
  • 28,160
  • 11
  • 74
  • 110
Ben B.
  • 85
  • 2
  • 5
  • Is there an important reason why you use cells and not a matrix? – cyborg Nov 28 '11 at 11:11
  • In fact, the difference with or without saving/loading matrices is smaller than for cell arrays, but the problem is still there : for 10000 tests, the computation is 4.3s vs 1.5s for matrices (10000 lines), and 6.1s vs 0.4s for cell arrays (10000 elements) – Ben B. Nov 28 '11 at 11:53

2 Answers2

5

I also know this problem. I think it's also related to the inefficient managing of memory in matlab - and as I remember it's not doing well with swapping. A 150MB file can easily hold a lot of data - maybe more than can be quickly allocated.

I just made a quick calculation for your example using the information by mathworks In your case total_size = n_tests*121 + n_tests*(1*4096* 8) is about 313MB.

First I would suggest to save them in format 7 (instead of 7.3) - I noticed very poor performance in reading this new format. That alone could be the reason of your slowdown.

Personally I solved this in two ways:

  1. Split the data in smaller sets and then use functions that load the data when needed or create it on the fly (can be elegantly done with classes)
  2. Move the data into a database. SQLite and MySQL are great. Both work efficiently with MUCH larger datasets (in the TBs instead of GBs). And the SQL language is quite efficient to quickly get subsets to manipulate.
bdecaf
  • 4,652
  • 23
  • 44
  • The size is around 300 MB indeed ... but it should be easily manageable by MATLAB 64 bits, shouldn't it ? I changed the format from 7.3 to 7 and there is no slowdown anymore, the performance is the same with or without saving ! I'll take a look at SQL for the future, as the data could be much bigger than that. Eventually, there is still this difference between cell arrays and matrices (the former is faster). Could it be related to the size of the matrix and its memory managing ? Anyway, thanks a lot for the tip ! – Ben B. Nov 28 '11 at 12:47
  • +1 Agreed, I personally use MS SQL Server Express which is given out with a nice free management tool - useful if you're new to it. As for connecting to dbs, I recommend using this library http://www.mathworks.com/matlabcentral/fileexchange/29615-adodbtools instead of the built in Matlab one. – James Jul 23 '13 at 19:46
-1

I test this code with Windows 64bit, matlab 64bit 2014b.

Without saving and loading, the computation is around 0.22s, Save the data file with '-v7' and then load, the computation is around 0.2s. Save the data file with '-v7.3' and then load, the computation is around 4.1s. So it is related to the compression of the MAT file.

zbyan
  • 11
  • 2