7

I have a Perl script that processes a bunch of file names, and uses those file names inside backticks. But the file names contain spaces, apostrophes and other funky characters.

I want to be able to escape them properly (i.e. not using a random regex off the top of my head). Is there a CPAN module that correctly escapes strings for use in bash commands? I know I've solved this problem in the past, but I can't find anything on it this time. There seems to be surprisingly little information on it.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
aidan
  • 9,310
  • 8
  • 68
  • 82

3 Answers3

6

If you can manage it (i.e. if you're invoking some command directly, without any shell scripting or advanced redirection shenanigans), the safest thing to do is to avoid passing data through the shell entirely.

In perl 5.8+:

my @output_lines = do {
    open my $fh, "-|", $command, @args or die "Failed spawning $command: $!";
    <$fh>;
};

If it's necessary to support 5.6:

my @output_lines = do {
    my $pid = open my $fh, "-|";
    die "Couldn't fork: $!" unless defined $pid;
    if (!$pid) {
        exec $command, @args or die "Eek, exec failed: $!";
    } else {
        <$fh>; # This is the value of the C<do>
    }
};

See perldoc perlipc for more information on this kind of business, and see also IPC::Open2 and IPC::Open3.

hobbs
  • 223,387
  • 19
  • 210
  • 288
  • This works well on Unix-like systems (which is all the OP asked for) and is a handy shell-less alternative to using `qx//` (backticks). Caveats for Windows users: the shell, `cmd.exe`, may _still_ get called, namely as a _fallback_ if shell-less invocation failed. Furthermore, any argument that contains _embedded_ double-quotes must be _escaped_ in order to be passed through correctly, with input argument boundaries preserved); you do that by _enclosing_ the value in _embedded_ double-quotes and by escaping the original double-quotes as required by the target program, typically as `\"`. – mklement0 Aug 25 '15 at 02:11
3

Are you looking for quotemeta?

Returns the value of EXPR with all non-"word" characters backslashed.

Update: As hobbs points out in the comments, quotemeta is not intended for this purpose and upon thinking a little more about it, might have problems with embedded nuls. On the other hand String::ShellQuote croaks upon encountering embedded nulls.

The safest way is to avoid the shell entirely. Using the list form of 'system' can go a long way towards that (I found out to my dismay a few months ago that cmd.exe might still get involved on Windows), I would recommend that.

If you need the output of the command, you are best off (safety-wise) opening a pipe yourself as shown in hobbs' answer

Community
  • 1
  • 1
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 3
    shell escaping *is* different from regexp escaping, and although I can't come up with a situation where quotemeta would give a truly unsafe result, it's not meant for the task. If you must escape, instead of bypassing the shell, I suggest trying `String::ShellQuote` which takes a more conservative approach using `sh` single quotes to defang everything except single quotes themselves, and backslashes for single quotes. – hobbs Aug 13 '09 at 14:25
  • While `quotemeta()` wasn't built for this task, it should work robustly on Unix-like systems (`String::ShellQuote` only works on Unix-like systems too and provides no advantage here). On Windows, simply enclosing each filename in double-quotes should do. See my answer for a simple cross-platform subroutine (which, on Unix, encloses arguments in single-quotes, with embedded single-quotes "escaped" as `'\''`). – mklement0 Aug 25 '15 at 02:02
  • As for NUL (`0x0`) chars.: from what I can tell, neither POSIX-like shells on Unix-like system nor `cmd.exe` on Window support NUL characters on the command line (as opposed to file/pipe input), so I don't think supporting NUL chars. is a concern (whether the filesystem supports NUL chars. in filenames is therefore a moot point, given that you wouldn't be able to refer to such files by name on the command line). – mklement0 Aug 25 '15 at 02:03
  • 1
    Yes, sadly, the `LIST` syntax form of `system()` and `exec()` _falls back_ to `cmd.exe` on Windows, if the initial, shell-less invocation failed. However, by using the `PROGRAM LIST` syntax form, you can avoid that. Unfortunately, hobbs' output-capturing `open my $fh, '-'` approach does _not_ offer this variant, and is therefore invariably susceptible to fallback to `cmd.exe`. – mklement0 Aug 25 '15 at 02:10
2

tl;dr

The following subroutine safely quotes (escapes) a list of filenames (paths) on both Unix-like and Windows systems:

#!/usr/bin/env perl

sub quoteforshell { 
  return join ' ', map { 
    $^O eq 'MSWin32' ?
      '"' . s/"/""/gr . '"'
      : 
      "'" . s/'/'\\''/gr . "'" 
  } @_;
}

#'# Sample invocation
my $shellcmd = ($^O eq 'MSWin32' ? 'echo ' : 'printf "%s\n" ') . 
  quoteforshell('\\foo/bar', 'I\'m here', '3" of snow', 'bar |&;()<>#!');

print `$shellcmd`;

Output of the sample command on Unix-like systems, showing that all input arguments were passed through unmodified:

\foo/bar
I'm here
3" of snow
bar |&;()<>#!
  • On Unix-like systems, it should work with any strings (except ones with embedded NUL chars), not just filenames - see below for details.

  • On Windows, embedded " instances are escaped as "", which is the only safe way to do it, but, sadly, may not be what the target program expects - see below for details; note, however, that this is not a concern if you're only passing filenames on Windows, because " is not a legal filename character.

  • See the bottom of this post for a shell-less command-invocation alternative that bypasses the "-quoting problem on Windows.


On Unix-like platforms, qx// (the generalized form of `...`) and the single-argument forms of system and exec invoke the shell by passing the command to /bin/sh -c. /bin/sh is assumed to be POSIX-compatible (and may or may not be Bash on a given system).

The single-argument forms of system and exec may or may not involve a shell - they decide based on the specific command passed whether involvement of a shell is needed. For instance, if a command has embedded (literal) single- or double-quotes, the shell is called. Since the solution below is based on embedding single-quoted tokens in the command string, it also works with the single-argument form of system and exec.

In POSIX-compatible shells you can take advantage of single-quoted strings, which do not interpolate their contents in any way.

The only challenge is to escape single-quotes (') themselves, which requires trickery, because, strictly speaking, embedding single-quotes in a single-quoted strings is not supported by the shell.

The trick is to replace every ' instance with '\'' (sic), which works around the problem by effectively splitting the input string into multiple single-quoted strings, with escaped ' instances - \' - spliced in - the shell then reassembles the string parts into a single string.

Here's a subroutine that take a list of strings (filenames) and returns a space-separated string of quoted versions of the strings that guarantee literal use by the shell:

sub quoteforsh { join ' ', map { "'" . s/'/'\\''/gr . "'" } @_ }

Example (uses most POSIX shell metacharacters):

my $shellcmd = 'printf "%s\n" ' . 
                  quoteforsh('\\foo/bar', 'I\'m here', '3" of snow', 'bar |&;()<>#!');
print `$shellcmd`;

This passes the following to /bin/sh -c (shown here as a pure literal, without any quoting):

 printf "%s\n" '\foo/bar' 'I'\''m here' '3" of snow' 'bar |&;()<>#!'

Note how each input string is in enclosed in single-quotes, and how the only character that needed quoting among all input strings was ', which, as discussed, was replaced with '\''.

This should output the input strings as-is, one on each line:

\foo/bar
I'm here
3" of snow
bar |&;()<>#!

On Windows, the analogous subroutine looks like this:

sub quoteforcmdexe { join ' ', map { '"' . s/"/""/gr . '"' } @_ }

This works analogous to quoteforsh() above, except that

  • double-quotes are used to enclose the tokens, because cmd.exe doesn't support single-quoting.
  • the only character that needs escaping is ", which is escaped as "" - note, however, that for filenames this isn't strictly necessary, because Windows doesn't allow " instances in filenames.

However, there are limitations and pitfalls:

  • You cannot suppress interpretation of references to existing environment variables, such as %USERNAME%; by contrast, non-existing variables or isolated % instances are fine.
    • Note: You should be able to escape % instances as %%, but while that works in a batch file, it inexplicably doesn't work from Perl:
      • `perl "%%USERNAME%%.pl"` complains, e.g., about %jdoe%.pl not being found, implying that %USERNAME% was interpolated, despite the doubled % chars.
      • (On the flip side, isolated % instances in double-quoted strings don't need escaping the way they do in batch files.)
  • Escaping embedded " instances as "" is the only SAFE way to do it, but it is not what most target programs expect.
    • On Windows, incredibly, the required escaping is ultimately up to the target program - for full background, see https://stackoverflow.com/a/31413730/45375
    • In short, the quandary is:
      • If you escape for the target program - and most, including Perl, expect \" - then part of the argument list may never be passed to the target program, with the remaining part either causing failure, unwanted redirection to a file, or, worse, unexpected execution of arbitrary commands.
      • If you escape for cmd.exe, you may break the target program's parsing.
      • You cannot escape for both.
      • You can work around the problem if your command doesn't need involving the shell at all - see below.

Alternative: shell-less command invocation

If your command is an invocation of a single executable with all arguments to be passed as-is, there's no need to involve the shell at all, which:

  • doesn't require quoting of the arguments, which notably bypasses the "-quoting problem on Windows
  • is generally more efficient

The following subroutine works on both Unix-like systems and Windows, and is a shell-less alternative to qx// (`...`), which accepts the command to invoke as a list of arguments to interpret as-is:

sub qxnoshell {
  use IPC::Cmd;
  return unless @_;
  my @cmdargs = @_;
  if ($^O eq 'MSWin32') { # Windows
    # Ensure that the executable name ends in '.exe'
    $cmdargs[0] .= '.exe' unless $cmdargs[0] =~ m/\.exe$/i;
    unless (IPC::Cmd::can_run $cmdargs[0]) { # executable not found
      # Issue warning, as qx// would and open '-|' below does.
      my $warnmsg = "Executable '$cmdargs[0]' not found";
      scalar(caller) eq 'main' ? warn($warnmsg . "\n") : warnings::warnif('exec', $warnmsg);
      return; 
    }
    for (@cmdargs[1..$#cmdargs]) {
      if (m'"') {
        s/"/\\"/; # \-escape embedded double-quotes
        $_ = '"' . $_ . '"'; # enclose as a whole in embedded double-quotes
      }
    }
  }
  open my $fh, '-|', @cmdargs or return;
  my @lines = <$fh>;
  close $fh;
  return wantarray ? @lines : join('', @lines);
}

Examples

# Unix: $out should receive literal '$$', which demonstrates that
# /bin/sh is not involved.
my $out = qxnoshell 'printf', '%s', '$$' 

# Windows: $out should receive literal '%USERNAME%', which demonstrates
# that cmd.exe is not involved.
my $out = qxnoshell 'perl', '-e', 'print "%USERNAME%"' 
  • Requires Perl v5.9.5+ due to use of IPC::Cmd.
  • Note that the subroutines works hard to make things work on Windows:
    • Even though the arguments are passed as a list, open ..., '-|' on Windows still falls back on cmd.exe if the initial invocation attempt fails - the same applies to system() and exec(), incidentally.
    • Thus, in order to prevent this fallback to cmd.exe - which can have unintended consequences - the subroutine (a) ensures that the first list argument is an *.exe executable, (b) tries to locate it, and (c) only tries to invoke the command if the executable could be located.
    • On Windows, sadly, any argument that contains embedded double-quotes is not passed through correctly to the target program - it needs escaping by (a) adding embedded double-quotes to enclose that argument, and (b) by escaping the original embedded double-quotes as \".
Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Great write-up. By the way, I am doing some research on Shell quoting for recursive `system` calls ( e.g. `system 'bash -c 'bash -c '\''echo "hello";'\''';` ) .. these type of calls are not handled correctly by [`String::ShellQuote`](https://metacpan.org/pod/String::ShellQuote).. I have contacted the maintainer two weeks ago, but he seems to have left the scene. I wondered if you might be interested in helping improving that module? (It seems it is also lacking Windows support. ) – Håkon Hægland Aug 25 '15 at 11:27
  • 1
    Thanks, @HåkonHægland. I appreciate the suggestion, but can't spend the time at the moment (I emailed the maintainer a link to this answer, but your experience suggests that that may go unheard). – mklement0 Aug 25 '15 at 12:27
  • 1
    I encountered many use cases.. the most common is `ssh` commands, see for example [Quoting in bash and perl in recursive ssh command](http://stackoverflow.com/questions/23597777/quoting-in-bash-and-perl-in-recursive-ssh-command). Other use cases is when you need to load `~/.bashrc`, see for example [Running system command under interactive bash shell](http://stackoverflow.com/questions/27581085/running-system-command-under-interactive-bash-shell) – Håkon Hægland Aug 25 '15 at 13:04
  • 1
    Also,it is possible (and in my opinion the best approach) to use single quotes (and not double quotes) with nested comands: for example: `bash -c 'bash -c '\''bash -c '\''\'\'''\''echo "hello";'\''\'\'''\'''\'''` is a level 3 nested command, and will print `"hello"` – Håkon Hægland Aug 25 '15 at 13:16
  • 1
    @HåkonHægland: Thanks for the use cases. You're right: nesting with single-quotes _is_ possible; my misconception was that I assumed a _single_ Bash instance would have to recognize nested single-quotes, but that's not true: the successive instances each see a valid command resulting from string processing in the previous instance. – mklement0 Aug 25 '15 at 14:03
  • @HåkonHægland: Just to be clear about the scope of `String::ShellQuote::shell_quote()` and `quoteforshell()` above: they take one or more tokens to be quoted _individually_, and the result is intended to serve as a _top-level building block_ for a shell-command string; that is, they're only guaranteed to work as distinct argument(s), _outside of any quoted strings_ in the overall shell command. I have a vague sense of what you're after, but it's mind-bending stuff, and I encourage you to post a new question where you describe the desired functionality in _general_, parameterized terms. – mklement0 Aug 25 '15 at 15:00