13

I am using exist(x, 'file') to check for the existence of a file on my machine. The execution of this command takes FOREVER (over 10 seconds per call!).

My matlabpath is not too long (about 200 entries) and all folders on path are on my local drive (no network).

  1. Why does exist takes forever?
  2. Is there a way to make it run FASTER?

PS,
This call to exist is part of Matlab's execution of loadlibrary. So, if you are calling loadlibrary and you don't know why it takes forever - this question is also for you.

Amro
  • 123,847
  • 25
  • 243
  • 454
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 2
    Just out of curiousity, what is the value of `x`? – Eitan T Apr 03 '13 at 11:45
  • Just in case, have a look at this problem I had a while ago. If you are writing to a file which is in your matlab path before calling `exist` this could cause a problem: http://stackoverflow.com/questions/15386917/why-does-writing-to-an-unrelated-file-cause-the-load-function-to-be-so-slow – devrobf Apr 03 '13 at 11:53
  • Also; not an answer exactly, but if you can download `existfile` it will probably solve the issue: http://www.mathworks.co.uk/matlabcentral/fileexchange/13775-multicore-parallel-processing-on-multiple-cores/content/existfile.m – devrobf Apr 03 '13 at 11:54
  • @jazzbassrob I'm afraid the `exist` is part of code I cannot change. Thus the `existfile` solution is not applicable for me. But thanks anyhow. – Shai Apr 03 '13 at 11:58
  • @EitanT - This is all part of `loadlibrary`. The file `x` is a header file that exists on my path. – Shai Apr 03 '13 at 12:06
  • 1
    `exist` takes forever because file access in Matlab is slow. The only way I've found to make it run faster is to replace it (e.g. to check for a directory, I've a function that tries to `cd` instead). – Jonas Apr 03 '13 at 12:25
  • In my experience, `exist` has always been quite fast. It seems like something is wrong. Are you (or the code you are using) doing anything with [change notification handles](http://www.mathworks.com/support/solutions/en/data/1-18IFI/)? Those settings could affect performance. – shoelzer Apr 03 '13 at 14:29
  • @shoelzer - I don't think CNH applies in my case since all folders in my path are local (c:\). – Shai Apr 03 '13 at 14:34
  • @Shai Yes, you are correct. So is `exist` slow when you call it yourself, or only from within `loadlibrary`? Is it still slow when you reduce the number of dirs in `matlabpath`? Do you have a huge number of header files in the same dir as the file you are looking for? I'm not sure what the problem could be, so these are just ideas to maybe figure it out. – shoelzer Apr 03 '13 at 14:46
  • I know this is typically something you DO NOT WANT to do, but perhaps it can do the trick in this specific case: If `exist` is in code you cannot change, and is only used to check for files, then you could perhaps overload it and call `existfile` anyway. – Dennis Jaheruddin Apr 03 '13 at 15:38
  • @DennisJaheruddin - I was thinking along the same lines... The problem is `exsit` is used too many times and not only for files... bummer :-( – Shai Apr 03 '13 at 15:41
  • On second thought, you could let your overloaded function check the second input argument to decide whether you want to use `existfile`, and otherwise use the regular `exist`. – Dennis Jaheruddin Apr 03 '13 at 15:46
  • 2
    200 path entries sounds like kind of a lot. What OS are you on? You could trace the program's system calls to see what it's doing, for example, with Sysinternals' Process Monitor on Windows. It'll show you all the file accesses and their durations, which may give you a lead. – Andrew Janke Apr 07 '13 at 03:23

4 Answers4

21

Here's one idea. You could put the directory containing those header files up at the front of the MATLAB path, so when exist() goes looking through the path, it finds them quickly and doesn't have to search through the rest of the entries. If it's spending its time stepping through your path, that may help.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Andrew Janke
  • 23,508
  • 5
  • 56
  • 85
  • I'll certainly give it a try! – Shai Apr 07 '13 at 05:41
  • Try the procmon thing, too, if you're on Windows - I suspect you might actually be running in to Change Notification Handle limitations. Even though they're all local dirs, it might still be an issue simply because there's so many of them. – Andrew Janke Apr 07 '13 at 05:51
  • Thank you for the guidance and tips. I finally nailed it - too many files in %TEMP% folder... Your advice was very helpful. – Shai May 07 '13 at 07:30
  • 6
    Another +1 from http://meta.stackexchange.com/questions/179409/how-can-i-award-a-user-with-reputation-apart-from-trivial-upvoting-accept. – Andrew Cheong May 08 '13 at 21:20
  • 1
    @AndrewJanke I hope it's not too late to say "thank you" :) – Shai Jan 06 '16 at 06:56
  • @Shai Never too late. Thanks for the thanks, and hope this helped you with your work. :) – Andrew Janke Jan 06 '16 at 06:57
18

Wow! That was a tough one. Bottom line: Delete %TEMP% files!

I had a few thousands files lying around in %TEMP%. It appears MATLAB really likes to go over and over the TEMP directory.

After clearing the TEMP folder, exist runs in no time!

(Thanks Andrew for the Process Monitor advice!)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Shai
  • 111,146
  • 38
  • 238
  • 371
1
  1. exist is a built in Matlab function. It is designed to check existence of other types of objects (such as variables in Matlab) as well as files. Being a built in function, it's not a simple to see how it is coded. At least on Windows, when you call exist('filename','file') it seemingly only makes one API call to the operating system to check the file existence. So either the operating system is taking a long time, or there is some bloat in the exist function making it run slowly. See the solutions from the other posters for ideas on how to make the operating system return its result more quickly

  2. People sometimes complain that running exist('filename','file') in a loop makes the loop very slow, this is due the call taking perhaps milliseconds and looping over a few thousand times. The solution here is to replace

    if exist('filename','file')   
      % your code

with the line

    if java.io.File('filename').exists
      % your code
Bob Mortimer
  • 439
  • 7
  • 17
  • an interesting approach to use the `java` API for this task. Can you confirm (using, e.g., profiler) that the java approach is indeed faster? – Shai Jan 05 '16 at 15:03
0

For 372 files Matlab: Elapsed time is 40.207266 seconds. (get a cup of thee) Java: Elapsed time is 0.122165 seconds. (eye blinking)