I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here.
Here is some code that will cause the problem:
function profiler_test
%%% Create 20 files with random data
count = 20;
for i = 1 : count
x = rand(3);
save(sprintf('temp_file_%06d', i), 'x');
end
%%% Load them in a for loop
xs = cell(1, count);
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
end
toc
%%% Load them in a for loop, but writing a small log file on the way
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
file = fopen(sprintf('temp_logfile_%d', i), 'w');
fprintf(file, 'Success\n');
fclose(file);
end
toc
end
The first for
loop takes 0.239739 seconds, the second takes 4.411179.
Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for
loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. I'm fine with that.
My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this:
1 24 tic;
1 25 for i = 1 : count
4.41 20 26 x = load(sprintf('temp_file_%06d', i), 'x');
20 27 xs{i} = x.x;
28
20 29 file = fopen(sprintf('temp_logfile_%d', i), 'w');
20 30 fprintf(file, 'Success\n');
20 31 fclose(file);
20 32 end
1 33 toc
It's placed the entire time taken to execute the final three lines on the line for load
. In my actual program, the load
was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. My question is: what is going on here? Why has this happened and should I be watching out for any more bizarre behaviour like this?
I'm using MATLAB 2011a. Many thanks.
EDIT: I seem to be causing some confusion, apologies. Here is the situation:
- The two
for
loops shown above are identical, except that the second one has three lines at the bottom which write to a temporary file each iteration. - The second loop takes substantially longer to run: the conclusion is that those last three lines are to blame for the speed increase. When they are removed, the code is fast again.
- However, the profiler does not attribute any of the time for the second loop to those final three statements. Instead, it tells me that my
load
function call - exactly the same call as the first loop, which was faster - is now taking 4 seconds instead of 0.2. So either the presence of the last three lines causes theload
to be slow (I had disregarded this; is that even a possibility?), OR the MATLAB profiler is incorrectly reporting thatload
is taking 4 seconds when it is clearly not.
Either way it seems to me that something very strange is happening.
EDIT: Seem to have answered it myself, see below. Changed the title as it was misleading