3

I'm in maintenance mode and I'm working with a Perl script that is run on Apple, Linux, Windows and Unix. Some Apple and Linux and most Windows have spaces in the path. On Windows, the long file name needs quotes. On Apple and Linux the space needs a backslash. If there's no space, then nothing needs to be done.

Perl's File::Copy and File::Spec are aware of system differences and it abstracts them for different file systems. Looking through the other File functions, I don't see what is used to normalize or canonicalize a pathname which adds quotes, slashes, moves quotes around, etc. as required.

Perl version requirements are v5.10. So I should be able to expect at least v5.10 without any trouble.

What is the Perl function to normalize or canonicalize path with spaces?


Here's an oversimplified example on Windows:

my $testcat = catfile(catdir("\"C:\\Program Files\"", "My Program"), "test.txt");
print "Test cat: $testcat\n";

The result is the following. Notice the quoting is not right and the path separator is wrong.

Test cat: "C:/Program Files"/My Program/test.txt

Here is what I expert on a Windows system (or an error):

Test cat: "C:\Program Files\My Program\test.txt"

There are similar questions, but they all seem to be one-off. For example How to handle filenames with spaces? says to manually add quotes for Windows. I'm looking for the Perl routines to do it.

Community
  • 1
  • 1
jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    Are you talking about generating paths for use in shell commands? – ThisSuitIsBlackNot Mar 27 '16 at 15:32
  • 1
    @ThisSuitIsBlackNot - they are used in two ways. First, it is sometimes used to call another program, like NSAM. In this case, there are no option; I'm only asking PERL to build the pathname correctly for me. I'll handle the options. Second, its adding include directories and libraries directories for a Makefile. This is the case that is causing me the most trouble. Currently logic splits on the space, so a path like `C:\Program Files\My Program` is being tokenized into `-IC:\Program` and `-IFiles\My` and `-IProgram`. This is the case I am actively working. – jww Mar 27 '16 at 15:36
  • 1
    As far as I know you need to do nothing more than enclose the path in double quotes. Unless the path contains quotes or dollar signs that will work everywhere I can think of – Borodin Mar 27 '16 at 16:50
  • @Borodin - PERL does not concatenate quoted path and filename properly. I'm guessing it probably won't handle two quoted paths. – jww Mar 27 '16 at 17:38
  • 1
    @jww: That depends on the context. You should build your path string correctly and *then* enclose it in quotes to pass it to whatever. You should always use [File::Spec](https://p3rl.org/File::Spec) or [File::Spec::Functions](https://p3rl.org/File::Spec::Functions) to work with file paths. You need to say more about what you're doing – Borodin Mar 27 '16 at 18:41
  • 1
    Oh, and it's ***Perl***, like *Python* and *Ruby*. It's not an acronym, even though you will read about people having invented one for it after the fact. It was originally going to be called *Pearl* but there was already a language called *PEARL* (which *is* an acronym!) – Borodin Mar 27 '16 at 19:07
  • 1
    File::Spec::Win32 and File::Spec::Unix do no validation in `catdir`, `catfile`, and `canonpath`; they simply do a series of substitutions on what you pass them. If you pass an invalid path component (e.g. one containing a null byte on *nix or [a double quote on Windows](https://msdn.microsoft.com/en-us/library/aa365247), I think), you'll get an invalid result: garbage in, garbage out. I agree that File::Spec should warn about this; it's been in core for a long time, so maybe there's a reason it doesn't warn. I would file a bug report/feature request. – ThisSuitIsBlackNot Mar 28 '16 at 16:24
  • Ah, thanks ThisSuitIsBlackNot. I presumed Perl knew what it was doing. It suited me well because the quoted path came in as a command line argument, and the baked output needed it, too. Perl did not produce an error, so I thought it knew what it was doing. I presumed I was lacking a call to a function that normalizes or canonicalizes. Live and learn... (I'm going to try some of the other, more egregious characters, like `+` and `*`, to see if they are mishandled, too). – jww Mar 28 '16 at 16:30
  • @ThisSuitIsBlackNot - perhaps you should answer since you seem to have the complete picture. Others will likely land here through a search, and it will likely benefit the to know the handling behaviors. – jww Mar 28 '16 at 16:33
  • I'll throw together an answer if I get some time, although I think Borodin's answer shows how you have to do this. As for `+` and `*`, you can pass any characters you want and File::Spec::Win32 and File::Spec::Unix will happily pass them through; see the [source](https://metacpan.org/source/RJBS/PathTools-3.62/lib/File/Spec/Unix.pm#L102) for the pure-Perl version of `File::Spec::Unix::cat_dir`, for example, which just does a `join` on `/`. – ThisSuitIsBlackNot Mar 28 '16 at 17:00
  • Borodin - sorry, I have no idea what you are talking about. – jww Mar 30 '16 at 10:11
  • Thinking about this more, `catdir`, `catfile`, etc. probably don't warn about this because which characters are valid in filenames [depends on the underlying filesystem](https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations). `catdir` and `catfile` don't examine the filesystem, so there's no way they could generate an accurate warning. – ThisSuitIsBlackNot Mar 30 '16 at 21:43

2 Answers2

3

I'm not sure how you managed to get the output you describe. On Windows I get:

Test cat: "C:\Program Files"\My Program\test.txt

On OSX, I get:

Test cat: "C:\Program Files"/My Program/test.txt

Which OS and version of Perl are you using? Is it possible you left out some relevant parts of your script.


Your example shows confusion about quoting and escaping strings in Perl. It might help to break it down into smaller pieces to see what's going on and put the pieces together later:

print "\"C:\\Program Files\""

"C:\Program Files"

This is probably what you expected. It uses raw interpolation to build the string you want to use. Note: you can simplify this statement by using non-interpolated strings:

print '"C:\Program Files"'

Appending the directory, you start using File::Spec:

use File::Spec::Functions;
print catdir('"C:\Program Files"', "My Program")

"C:\Program Files"\My Program

This is where things get funky. catdir expects a list of directories, but you provided a string that is almost certainly not a directory as the first item in the list.

Given you prepended the directory with the C:\ volume, there's a good chance you actually want to use the catpath function:

  • catpath()

    Takes volume, directory and file portions and returns an entire path. Under Unix, $volume is ignored, and directory and file are concatenated. A '/' is inserted if need be. On other OSes, $volume is significant.

      $full_path = File::Spec->catpath( $volume, $directory, $file );
    

The resulting string would not be directly useable on the command line if there are spaces because Perl makes some rather Unixish assumptions. But as answers to the related question point out, you can insert double quotes after constructing the path. As it turns out double quotes escape protect spaces on OSX and Linux; you don't need to escape each individual space.

Alternatively, use a module designed for accomplishing whatever you are trying to do. File::Copy does a good job of addressing cross-platform concerns, for instance.

Community
  • 1
  • 1
Jon 'links in bio' Ericson
  • 20,880
  • 12
  • 98
  • 148
  • *"I'm not sure how you managed to get the output you describe..."* - I literally took the MCVE, and plugged it into OpenSSL's [`Configure`](http://github.com/openssl/openssl/blob/master/Configure#L157) script at line 165. And I ran it on Windows 7 x64 with Strawberry Perl 5.22. – jww Mar 29 '16 at 06:03
  • 1
    That's a fairly random thing to do. Is that the script you are maintaining? What, exactly, are you trying to accomplish? Are you sure the output is _exactly_ as you included in the question? – Jon 'links in bio' Ericson Mar 29 '16 at 06:16
  • *This is where things get funky. catdir expects a list of directories...* - yeah, this is what got me. Microsoft runtimes and scripts will reject an [invalid path](http://msdn.microsoft.com/en-us/library/fyy7a5kt%28v=vs.110%29.aspx) during path building. When Perl did not reject, I presumed it was happy and mostly knew what it was doing. That was a colossal mistake... – jww Mar 29 '16 at 06:22
  • *"That's a fairly random thing to do. Is that the script you are maintaining?"* - It's not random; its the script causing the problems. Its choking on paths with spaces. Also see, for example, [Issue 4466: NMAKE fatal error U1073: "don't know how to make 'LNAME\openssl](https://rt.openssl.org/Ticket/Display.html?id=4486&user=guest&pass=guest). Its not limited to Windows. On \*nix, I can duplicate it with *`cd /tmp; git clone git://git.openssl.org/openssl.git "openssl workspace"`*. – jww Mar 29 '16 at 06:27
  • Now open for the Linux and Unix world: [Issue 4492: Configure, Unix and Linux, and malformed command line when path includes spaces](https://rt.openssl.org/Ticket/Display.html?id=4492&user=guest&pass=guest). I lack the Perl skills to go any further with it. The mainstream devs will have to handle it. – jww Mar 29 '16 at 07:07
  • 1
    @jww: When debugging perl scripts, I rarely find the .NET documentation as useful as Perl documentation. I'm not sure about 4466, but the problem in 4492 is almost certainly with Makefile.in, so you are barking up the wrong tree. You should read: http://stackoverflow.com/questions/7039130/bash-iterate-over-list-of-files-with-spaces – Jon 'links in bio' Ericson Mar 29 '16 at 16:25
  • Thanks Jon. *"I rarely find the .NET documentation as useful as Perl documentation..."* - absolutely, agreed. The problem is, on Windows, there are some reserved characters that cannot be used in filenames. For whatever reason, Perl did not reject them. When Perl did not reject the quote, I presumed it knew how to handle paths with spaces and quoted paths. I'm guessing it will allow other illegal characters, like `:` on Apple's HFS. It seems like some of Perl's file functions are just dumb string functions repackaged. – jww Mar 30 '16 at 10:39
  • Thanks again Jon. *"... the problem in 4492 is almost certainly with Makefile.in"* - I'm familiar with makefiles, so I was able to follow up on it. I don't believe `Makefile.in` is the culprit. `find` is used regularly, and the few uses of `ls` are limited to building the tarball. Also, the include directories are built by the confgure script and Perl. By the time they hit the makefile they are already mis-processed. Thanks for the tip, though. – jww Mar 30 '16 at 16:47
  • 1
    @Borodin This is kind of a lame workaround, but you can make an edit to unlock your vote. I just edited so you should be able to change your vote now. – ThisSuitIsBlackNot Mar 30 '16 at 19:06
  • @ThisSuitIsBlackNot: Hah! I was aware that an edit would unlock the vote but it never occurred to do it myself. Or perhaps it wouldn't have worked if it was my edit. Anyway it's fixed. Thank you – Borodin Mar 30 '16 at 20:40
2

I'm not sure why you think you need a library function to wrap something in double quotes?

You're mixing in the quotes/escapes far too early. They're only needed in certain circumstances when they are part of a longer string that will be treated as a space-separated list of substrings. The most obvious example being a command line for cmd/bash

While you're working with the string in your program you need just the plain path string without any decoration. Once you've built your path, create your command line (or whatever) with quotes around it, and it should all work

I've never been able to get the escape character for Windows cmd (which is circumflex ^) to work reliably, so I always wrap any strings that contain space characters in double quotes. That works on Windows and any flavour of Unix, including OSX

Here's an example using the code in your question. Note that there's no need to be so careful about using catdir and catfile appropriately: unless you're building a root directory like C:\ they behave identically on systems where there is no syntactical distinction between files and directories () which includes all the platforms you mention in your question

use strict;
use warnings 'all';

use File::Spec::Functions qw/ catfile /;

my $testcat = catfile('C:\Program Files', 'My Program', 'test.txt');

print qq{Test cat: "$testcat"\n};

system qq{type "$testcat"};

output

Test cat: "C:\Program Files\My Program\test.txt"
TESTCAT CONTENTS



Update

Here's another example showing how path segments that have reached your program can be unquoted before they're used. I've defined three scalar variables. Some or all of those may have originated outside your program, while others may be defined like this, as string literals. The point is that $root is enclosed in unwanted double quotes; it is an invalid path segment and won't work if you pass it to catfile

So I've written a little subroutine unquote and applied it to all three as we're pretending we don't know which of the segments are quoted and which are not. As you can see from the output, it removes the quotes from $root but leaves the other two strings untouched. Now they're all valid and okay to pass to catfile

The output shows that catfile returns Test cat: C:\Program Files\My Program\test.txt which is what we want. Now suppose we want to type it, so we need to create the command line

type "C:\Program Files\My Program\test.txt"

In the context of the command line, the double quotes are necessary to delimit the path string, but they not part of the path

Again, as you can see, the call to system works fine. My file contains TESTCAT CONTENTS, and that is what my program prints

I hope that helps?

use strict;
use warnings 'all';
use feature 'say';

use File::Spec::Functions qw/ catfile /;

my ($root, $dir, $file) = ( '"C:\Program Files"', 'My Program', 'test.txt');

print <<END;
Original:
Root: $root
Dir:  $dir
File: $file

END


unquote($_) for $root, $dir, $file;


print <<END;
Unquoted:
Root: $root
Dir:  $dir
File: $file

END


my $testcat = catfile($root, $dir, $file);

say "Full path: $testcat";

my $cmd = qq{type "$testcat"};
say "Command is:\n$cmd\n";

system $cmd;


sub unquote {
    $_[0] =~ s/\A"([^"]*)"\z/$1/;
    $_[0];
}

output

Original:
Root: "C:\Program Files"
Dir:  My Program
File: test.txt

Unquoted:
Root: C:\Program Files
Dir:  My Program
File: test.txt

Full path: C:\Program Files\My Program\test.txt
Command is:
type "C:\Program Files\My Program\test.txt"

TESTCAT CONTENTS
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • @jww: You may be right, but the usual response is for you to add more explanation. You seem to be hoping for a library that will let you use path segments whether or not they are already quoted? If you gave a better example of the problem then I am sure we could help. Why does your `C:\Program Files` string contain quotes and where did they come from? Where is your compound string `$testcat` destined for? – Borodin Mar 28 '16 at 05:37
  • @jww: You also mention *“Perl ignoring the directory separator for the platform”* which seems only tenuously related to the main thrust of your question. The usual solution is to push path strings through `canonpath` to normalise them. An explanation of this, too, wouldn't hurt – Borodin Mar 28 '16 at 05:49
  • @jww: All of that is what `File::Spec` does. But you are convolving the building of a path string with processing it so that it is represented unambiguously in a variety of circumstances. In some cases it may simply need wrapping in quotes, and I think that may be all that's necessary here. But just as you would *decode* a Unicode character string on input before processing it and then *encode* it on output, you need to *unquote* your path segments on input, work with them, and then *quote* them before output – Borodin Mar 28 '16 at 06:50
  • @jww: The precise nature of that quoting and unquoting depends on the rules and requirements of the intended environment, and it would be nonsense to pick a random style of quoting and default to that in, say, `File::Spec`. I think all you need is `s/\A"(.*)"\z/$1/` on everything on input, and `qq{"$path"}` on output. I need to elaborate on that, but it's the basic idea – Borodin Mar 28 '16 at 06:53
  • 2
    @jww: No, that';s wrong. `File::Spec` will do that for you. The canonical form of a file path doesn't involve escaping characters or enclosing it in quotes on any platform – Borodin Mar 28 '16 at 10:42