7

On a case-insensitive file system, such as NTFS or HFS+, given the name of a file, what is the most efficient way to determine the case-preserved version of the file name?

Consider on HFS+ (Mac OS X):

> perl -E 'say "yes" if -e "/TMP"'
yes

It says it exists, of course, but I have no idea how its case is preserved. What's the most efficient way to determine the actual case?

What I've tried so far:

  • glob with character classes: It doesn't work on Windows:

    > perl -E "say for glob "C:\\Perl"
    C:\Perl
    > perl -E "say for glob "C:\\[Pp][Ee][Rr][Ll]"
    

    Note the lack of output from that last command. :-(

  • opendir/readdir: Works, but seems rather inefficient to read an entire directory:

      > perl -E "opendir my $dh, 'C:\\'; say for grep { lc $_ eq 'perl' } readdir $dh; close $dh"
    Perl
    

Is it crazy to think that there ought to be some core operating system instructions or something to get at this information more efficiently?

theory
  • 9,178
  • 10
  • 59
  • 129

3 Answers3

4

On Windows,

>perl -MWin32 -E"say Win32::GetLongPathName($ARGV[0])" "C:\PROGRAM FILES"
C:\Program Files

>perl -MWin32 -E"say Win32::GetLongPathName($ARGV[0])" C:\PROGRA~1
C:\Program Files

On unix, fcntl's F_GETPATH function will do.

Community
  • 1
  • 1
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • "F_GETPATH" is not exported by the Fcntl module – theory Feb 19 '15 at 19:19
  • Since you're going to have to write some C to get the constant, you might as well write the function in C. – ikegami Feb 19 '15 at 20:02
  • I'm not gonna add C to Pod::Simple. My dumb heuristic will have to do for now. – theory Feb 19 '15 at 21:25
  • huh? A second ago, you were ok with it. (What do you think `GetLongPathName` is written in?) If you want to access the system, it's rather hard to avoid C. – ikegami Feb 20 '15 at 05:44
  • I'm fine to use modules written in C, but have no plans to add C/XS to Pod::Simple. If F_GETPATH was in the Functl module, I might use it. But it's not. – theory Feb 20 '15 at 19:24
  • So you don't mind using the C/XS code, as long as it's not in Pod::Simple... Who said anything about putting it in Pod::Simple? – ikegami Feb 20 '15 at 20:14
  • If I don't put it in Pod::Simple, since Pod::Simple is in core, I need to use something that's in core. Sounds like there isn't anything unless someone writes a module and gets it into core. But maybe I'm missing something? (Seems likely…) – theory Feb 20 '15 at 21:06
  • @theory, Or add the constant to Fcntl – ikegami Feb 20 '15 at 21:09
  • And then it would work only in the latest version of Perl to include that change. – theory Feb 20 '15 at 21:26
  • Right, ok, so the constant approach won't work (on its own). Note that Win32.pm isn't in core either. – ikegami Feb 20 '15 at 21:31
  • Yeah, so I think [my solution](http://stackoverflow.com/a/28614580/79202) is good enough for Pod::Simple. – theory Feb 20 '15 at 21:32
  • Just fall back to returning the input rather than nothing when there's no match. – ikegami Feb 20 '15 at 21:40
4

The opendir/readdir/grep solution is the proper one. Via Twitter, Neil Bowers points to this quotation from perlport:

Don't count on filename globbing. Use opendir, readdir, and closedir instead.

@miyagawa, also via Twitter, says that there is no system call for this, and if there was, it wouldn't be portable.

And given that @mob's answer and comments from David Golden suggest that glob would be more expensive than opendir, readdir, anyway, there just seems to be no other way around it.

So here's the function I wrote to find all the cases for a given basename in a directory:

sub _actual_filenames {
    my $dir = shift;
    my $fn = lc shift;
    opendir my $dh, $dir or return;
    return map { File::Spec->catdir($dir, $_) }
        grep { lc $_  eq $fn } readdir $dh;
}
theory
  • 9,178
  • 10
  • 59
  • 129
  • 1
    Fails for `C:\PROGRA~1` on Windows. At the very least, it should return `C:\PROGRA~1`, but `C:\Program Files` would be far better. – ikegami Feb 19 '15 at 18:56
2

The glob function does not recognize regular expression style character classes ([Pp], [Ee]). Instead, it uses csh-style wildcard expansion. To accomplish your example task, you'd want to use the syntax

glob("C:\\{P,p}{E,e}{R,r}{L,l}")

I don't know the implementation details of glob, but it seems like it would also need to examine every file in a directory, and would not necessarily be more efficient than your readdir/grep idiom.

Or more concisely (again, not necessarily more efficiently), a glob/grep idiom:

perl -E "say for grep {/C:\\PERL/i} glob('C:\*')"

(updated, still not tested)

mob
  • 117,087
  • 18
  • 149
  • 283