10

IMPORTANT UPDATE

I just made the discovery that after restarting Matlab and the computer, this simplified code no longer reproduces the problem for me either... I am so sorry for taking up your time with a script that didn't work. However, the old problem still persists in my original script if I save anything in any folder (that I have tried) in the inner 'for' loop. For my purposes, I have worked around it by simply not make this save unless I absolutely need it. The original script has the following structure in terms of for loops and use of save or load:

load() % .mat files, size 365x92x240
for day = 1:365
    load() % .mat files, size 8x92x240

    for type = 1:17
        load() % .mat files size 17x92x240
        load() % .mat files size 92x240

        for step 1:8
            %only calculations
        end
        save() % .mat files size 8x92x240

    end 
    save() % .mat files, size 8x92x240
end

% the load and saves outside the are in for loops too, but do not seem to affect the described behavior in the above script
load() % .mat files size 8x92x240
save() % .mat files size 2920x92x240
load() 
save() % .mat files size 365x92x240
load()
save() % .mat files size 12x92x240

If run in full, the script saves approx. 10 Gb and loads approx. 2Gb of data.

The entire script is rather lengthy and makes a lot of saves and loads. It would be rather impractical too share all here before I have managed to reproduce the problem in a reduced version, unfortunately. As I frustratingly discovered that the very same code could behave differently from to time to time, it immediately got more tedious than anticipated to find a simplification that consistently reproduces the behavior. I will get back as soon as I am sure about a manageable code that produces the problem.


PREVIOUS PROBLEM DESCRIPTION (NB. The code below does not for sure reproduce the described problem.):

I just learnt the hard way that, in Matlab, you can't name a saving folder to temp in a for loop without slowing down data loading in the next round of the loop. My question is why?

If you are interested in reproducing the problem yourself, please see the code below. To run it, you will also need a matfile called anyData.mat to load and two folders for saving, one called temp and the other called temporary.

clear all;clc;close all;profile off;
profile on

tT= zeros(1,endDay+1);
tTD= zeros(1,endDay+1);

for day = 0:2;
    tic
    T = importdata('anyData.mat')
    tT(day+1)=toc; %loading time in seconds

    tic
    TD = importdata('anyData.mat')
    tTD(day+1)=toc;

    for type = 0:1
        saveFile = ones(92,240);

        save('AnyFolder\temporary\saveFile.mat', 'saveFile') % leads to fast data loading 
        %save('AnyFolder\temp\saveFile.mat', 'saveFile') %leads to slow data loading

    end % end of type 

end% end of day

profile off
profile report

plot(tT)

You will see in y-axis of the plot that data loading takes significantly longer time when you in the later for loop save to temp rather than temporary. Is there anyone out there who knows why this occurs?

tshepang
  • 12,111
  • 21
  • 91
  • 136
LaWa
  • 183
  • 7
  • 4
    seems strange. Definitely can't reproduce it. Maybe you can upload your data. Also to make the example more concise - why don't you remove all the unused importdata commands? – bdecaf Jan 15 '13 at 10:17
  • 1
    Nope, not reproducible here as well. Actually, using "temp" is faster... – Rody Oldenhuis Jan 15 '13 at 11:11
  • 2
    Matlab version, operating system, type of storage ... ? – s-m-e Jan 15 '13 at 12:32
  • 1
    Is there network storage involved? ;) – XORcist Jan 15 '13 at 12:52
  • @ernestopheles I use Matlab version 7.3, Windows 7 and save to the local hard drive. I have also tried to load and save mat-files in v6 without much difference in this particular behaviour. – LaWa Jan 18 '13 at 09:01
  • i have never observed this behavior. i think you're doing something else wrong... – thang Jan 18 '13 at 09:24
  • (Using later Matlab versions and another operating system) I cant reproduce this odd behaviour. If this can really be reproduced over and over again on your machine, what happens, if you perform similar tests without the Matlab desktop and furthermore without the JVM running in the background? Besides, what happens, if you substitute the "importdata" command by "load" commands? What happens, if you invoke options like '-v7.3' in the save-commands? – s-m-e Jan 18 '13 at 16:24
  • If it is really the name of the folder, that makes the difference - and since you're running on Windows - one more idea: antivirus software. I guess, Matlab is unlikely causing this, but antivirus stuff actually behaves strange on certain occasions. Could be some sort of on-write or on-access scan, that does things different when applied to a temporary folder. – s-m-e Jan 18 '13 at 16:29
  • Please either close the question or use a portion of your modified question as an actual answer. – jml Jan 25 '13 at 21:57

2 Answers2

0

There are two things here

  1. Storage during a for loop is an expensive operation as it usually opens a file stream and closes it before it moves on. You might not be able to avoid this.
  2. Second thing is speed of storage and its cache speed. Most likely programs use temp folder for its own temporary files and have a garbage collector or software looking after these to clean them. If you start opening and closing file stream to this folder you have to send a request to get exclusive write access to the folder. This again adds to the time.

If you are doing image processing operations and you have multiple images you can run into a bottle neck with writing to hard drive due to its speed, cache and current memory available to MATLAB.

Farrukh Subhani
  • 2,018
  • 1
  • 17
  • 25
0

I can't reproduce the problem, suspect it's system and data-size specific. But some general comments which could help you out of the predicament:

As pointed out by commenters and the above answers, file i/o within a double for loop can be extremely parasitic, especially in cases where you only need to access part of the data in the file, where other system operations delay the process, or where the data files are large enough to require virtual memory (windows) / swap space (linux) to even load them. In the latter case, you could be in a situation where you're moving a file from one part of the hard disk to another when you open it!

I assume that you're loading/saving because you don't have c.10GB of ram to hold everything in memory for computation. The actual problem is not described, so I can't be certain, but think you might find that the matfile class to be useful... TMW documentation. This is used to map directly to/from a mat file. This:

  • reduces file stream opening and closing IOPS

  • allows arbitrarily large variable sizes (governed by disk size, not memory)

  • allows you to read/write partially (i.e. write only some elements of an array without loading the whole file)

  • in the case that your mat file is too large to be held in memory, avoids loading it into swap space which would be extremely cumbersome.

Hope this helps.

Tom

thclark
  • 4,784
  • 3
  • 39
  • 65