0

I need to read about 4000 data files, each about 400 kB. The data will be analysed later so I wrote the files into a data structure. The importing operation takes about 4 mins and I have tried importdata and dlmread too but there is not much difference.

Please let me know if it is the loop, the import function or Matlab is simply slow at importing multiple large files. catch/try is used because some of these files can't be read properly but it doesn't seem to be slowing the script down.

Here's the script:

for k=40020:10:75000
try
    name=['tmp' sprintf('%d',k)];
    c=c+1;
    m(k).count=k;
    m(k).col=load(name);
    [val in]=find(m(k).col(:,5)~=1);
    m(k).id=m(k).col(val,1);
    m(k).posx=m(k).col(val,2);
    m(k).posy=m(k).col(val,3);
    m(k).posz=m(k).col(val,4);
catch
    disp(['Error'])
end
end
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • What kind of files? Text files containing numbers? Do you preallocate `m`? – Daniel Dec 21 '14 at 22:32
  • 1
    To figure out which line is slowing the execution down, you should run the command using the MATLAB Profiler. If it is indeed the reading of the files which is slowing it down (which I suspect it is) then the line with the `load` command should show up as the bottleneck. – MrAzzaman Dec 21 '14 at 23:49
  • 1
    your loop variable `k` is not sequential (it has jumps of 10), so your `m` array is actually very sparse. – Shai Dec 22 '14 at 09:11
  • Have you tired [preallocate](http://stackoverflow.com/questions/21023171/variable-appears-to-change-size-on-every-loop-iteration-what) `m`? – Shai Dec 22 '14 at 09:12
  • No , I will try to do it now, I didn't use to preallocate structs before, but maybe this is slowing it down. – Sherbika Dec 22 '14 at 17:44

1 Answers1

1

A few things to note:

400 kB is not a large file.

4000 files in 4 minutes is 0.06 seconds each.

You dont appear to use the variable c.

Your matrix index starts at 40020 and each loop the next struct index containing data is +10 etc.... This is very sparse which is a waste of memory and a small amount of time.

You state you use dlmread and import data -> but then in the code you are using load. Is the files ascii?

The best way to find out where the time is taken is to use the profiler.

  profile on
  % run your code
  profile viewer

Are the files local or on a network? Reading files from a network performance can be a lot slower.

matlabgui
  • 5,642
  • 1
  • 12
  • 15
  • I did profile it , yes load is slowing it down, there is not much difference in speed between load,importdata and dlmread . I'm going to preallocate and see whether it helps , Thanks. – Sherbika Dec 22 '14 at 17:45
  • tried preallocating using : m=struct('count',[], 'col',[],'id',[],'posx',[],'posy',[],'posz',[]) didn't help! ..250s – Sherbika Dec 22 '14 at 22:14
  • How long does it take to load a single file? Is it comparable with 0.06 seconds? What format are your files ? – matlabgui Dec 22 '14 at 22:18
  • "Elapsed time is 0.071386 seconds" for loading one file. they are 5 columns of numbers. – Sherbika Dec 23 '14 at 10:55
  • You have your answer then -> FYI: I expect it would be quicker if you either 1. Save all your data as a single binary .mat file, 2. Save your files individually as binary .mat files. – matlabgui Dec 23 '14 at 11:04