3

When there are many files, around 4000, dir() function is very slow. My guess is it creates a structure and filling in the values in an inefficient way.

Are there any fast and elegant alternatives to using dir()?

Update: Testing it in 64 Bit, Windows 7 with MATLAB R2011a.

Update 2: It takes around 2 seconds to complete.

chappjc
  • 30,359
  • 6
  • 75
  • 132
nimcap
  • 10,062
  • 15
  • 61
  • 69

4 Answers4

8

Which CPU / OS are you using? I just tried it on my machine with a directory with 5000 files and it's pretty quick:

>> d=dir;
>> tic; d=dir; toc;
Elapsed time is 0.062197 seconds.
>> tic; d=ls; toc;
Elapsed time is 0.139762 seconds.
>> tic; d=dir; toc;
Elapsed time is 0.058590 seconds.
>> tic; d=ls; toc;
Elapsed time is 0.063663 seconds.
>> length(d)

ans =

        5002

The other alternative to MATLAB's ls and dir functions is to directly use Java's java.io.File in MATLAB:

>> f0=java.io.File('.');
>> tic; x=f0.listFiles(); toc;
Elapsed time is 0.006441 seconds.
>> length(x)

ans =

        5000
Jason S
  • 184,598
  • 164
  • 608
  • 970
7

Confirmed Jason S's suggestion for a networked drive and for a directory containing 363 files. Win7 64-bit Matlab 2011a.

Both foo and bar below yield the same cell array of filenames (verified using MD5 hashing of the data), but bar using Java takes significantly less time. Similar results are seen if I generate bar first and then foo, so this isn't a network caching phenomenon.

>> tic; foo=dir('U:\mydir'); foo={foo(3:end).name}; toc
Elapsed time is 20.503934 seconds.
>> tic;bar=cellf(@(f) char(f.toString()), java.io.File('U:\mydir').list())';toc
Elapsed time is 0.833696 seconds.
>> DataHash(foo)
ans =
84c7b70ee60ca162f5bc0a061e731446
>> DataHash(bar)
ans =
84c7b70ee60ca162f5bc0a061e731446

where cellf = @(fun, arr) cellfun(fun, num2cell(arr), 'uniformoutput',0); and DataHash is from http://www.mathworks.com/matlabcentral/fileexchange/31272. I skip the first two elements of the array returned by dir because they correspond to . and ...

Ahmed Fasih
  • 71
  • 1
  • 1
  • Thank you! Instead of cellf, you can use `arrayfun` directly: `bar = arrayfun(@(f) char(f.toString()), java.io.File('U:\mydir').list(),'UniformOutput',false);` – jarondl Jan 26 '15 at 13:42
1

%Example: list files and folders

Folder = 'C:\'; %can be a relative path
jFile = java.io.File(Folder); %java file object
Names_Only = cellstr(char(jFile.list)) %cellstr
Full_Paths = arrayfun(@char,jFile.listFiles,'un',0) %cellstr

%Example: list files (skip folders)

Folder = 'C:\';
jFile = java.io.File(Folder); %java file object
jPaths = jFile.listFiles; %java.io.File objects
jNames = jFile.list; %java.lang.String objects
isFolder = arrayfun(@isDirectory,jPaths); %boolean
File_Names_Only = cellstr(char(jNames(~isFolder))) %cellstr

%Example: simple filter

Folder = 'C:\';
jFile = java.io.File(Folder); %java file object
jNames = jFile.list; %java string objects
Match = arrayfun(@(f)f.startsWith('page')&f.endsWith('.sys'),jNames); %boolean
cellstr(char(jNames(Match))) %cellstr

%Example: list all class methods

methods(handle(jPaths(1)))
methods(handle(jNames(1)))
Sergey K
  • 11
  • 2
1

You can try LS. It returns only file names in character array. I didn't test if it faster than DIR.

UPDATE:

I checked on a directory with over 4000 files. Both dir and ls show similar results: about 0.34 sec. Which is not bad I think. (MATLAB 2011a, Windows 7 64-bit)

Is your directory located on a local hard drive or network? May be defragmenting the hard drive will help?

yuk
  • 19,098
  • 13
  • 68
  • 99