2

Somebody came up with the brilliant idea of putting spaces in a filename. I need to do scp from python using that filename, which is problematic because the shell parses the command, and scp has also some quircks regarding spaces. This is my test code:

import subprocess
import shlex


def split_it(command):
    return shlex.split(command)
    #return command.split(" ")


def upload_file(localfile, host, mypath):
    command = split_it('scp {} {}:"{}"'.format(localfile, host, mypath))
    print(command)
    res = subprocess.run(command, stdout=subprocess.PIPE)
    return res.stdout.decode()


upload_file("localfile.txt", "hostname", "/some/directory/a file with spaces.txt")

Which gives:

['scp', 'localfile.txt', 'hostname:/some/directory/a file with spaces.txt']
scp: ambiguous target

Using the naive version with command.split(" "):

['scp', 'localfile.txt', 'hostname:"/some/directory/a', 'file', 'with', 'spaces.txt"']
spaces.txt": No such file or directory

The right, working scp command would be:

['scp', 'localfile.txt', 'hostname:"/some/directory/a file with spaces.txt"']
  1. Is there a ready solution for this?
  2. If not, what would be the robust way of doing:
split_it('scp localfile.txt hostname:"/some/directory/a file with spaces.txt"')
# returns ['scp', 'localfile.txt', 'hostname:"/some/directory/a file with spaces.txt"']
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
blueFast
  • 41,341
  • 63
  • 198
  • 344
  • Related, although not directly a duplicate: https://stackoverflow.com/questions/19858176/how-to-escape-spaces-in-path-during-scp-copy-in-linux – mkrieger1 Jun 16 '17 at 19:08
  • @mkrieger1: the quoting in your link is precisely the way I create the scp command at the start. The trouble starts after that. – blueFast Jun 16 '17 at 19:09
  • Yes, as stated in the top answer there, you need to somehow double-escape the spaces. But I'm not sure if there's a good way to do that in Python. – mkrieger1 Jun 16 '17 at 19:10
  • @mkrieger1 shlex is doing a good job with parsing the shell parameters, taking spaces into account. The problem is that, *after that*, I need to re-quote the filename with spaces so that scp sees the correct filename – blueFast Jun 16 '17 at 19:12
  • Yes, I know. I think we've both understood the problem. – mkrieger1 Jun 16 '17 at 19:13
  • Maybe [this](https://docs.python.org/3/library/shlex.html#shlex.quote) helps, or [this](https://docs.python.org/2/library/pipes.html#pipes.quote) if using older Python versions. Got it from https://stackoverflow.com/questions/22729431/python-scp-copy-file-with-spaces-in-filename (*that* might be a duplicate) – mkrieger1 Jun 16 '17 at 19:17
  • Btw, I don't understand why you use `split_it` to split the string you've just assembled. Better create the correct argument list in the first place. – mkrieger1 Jun 16 '17 at 19:20
  • 1
    i.e., `command = ['scp', localfile, '{}:{}'.format(host, shlex.quote(mypath))]` – mkrieger1 Jun 16 '17 at 19:22
  • @mkrieger1 because once confronted with the problem, I wanted a general solution: *any* parameter could have spaces. The command to run could be, instead `scp`, `my strange command`. I wanted a solution which works in a generic way, but I will probably give up and use your suggestion if it works. Let me test it and I'll accept it as answer. – blueFast Jun 16 '17 at 19:26
  • 1
    @delavnog, **any** parameter is, eventually, passed to the program you're running as an array element (remember, `int main(int argc, char** argv)` is part of calling convention for **any** case when you're calling a new executable, no matter what language it's implemented in). Why are you trying to create a string in such a format as to be correctly split into an array, instead of just specifying the array of strings you actually want in the first place? – Charles Duffy Jun 16 '17 at 19:28
  • @CharlesDuffy since we are talking about shell commands, I wanted to reuse as much of the syntax as possible. That means, reusing the concept of spaces as arguments separators, which is what shlex does. This way I can define my command as a simple string, and split it to pass it to the subprocess module. This approach has limitations as it seems ... – blueFast Jun 16 '17 at 19:31
  • @mkrieger1 subprocess's documentation suggests `shlex.split` can be useful when tokenization is not obvious, I think it's a reasonable usage. The problem with the file with a space is that it wouldn't be possible to know, as far as I can tell, that it really is a file with a space, and not two files, which is why it has to be quoted. – Paulo Almeida Jun 16 '17 at 19:31
  • The problem arises not from splitting the command with shlex, which works fine, but from the idiosincracies of scp, I would say – blueFast Jun 16 '17 at 19:32
  • @delavnog, reusing shell syntax is a Bad Idea (and I'm saying this as someone near the top of the leaderboard on many of StackOverflow's shell-related tags). Separating data from code is a critical issue when trying to write correct shell scripts, but the mechanisms used to do so when writing real scripts aren't unable when you don't have an out-of-band mechanism to pass data. – Charles Duffy Jun 16 '17 at 19:33
  • that said, yes, scp *is* idiosyncratic, and *does* make the situation worse than it would otherwise be. – Charles Duffy Jun 16 '17 at 19:35
  • Frankly, what @mkrieger1's answer promotes is the exact same thing I would advise someone to do when writing code in native shell to generate a scp command line copying to a remote name with spaces. (The native bash equivalent is: `printf -v rmt_name_q '%q' "$rmt_name"; scp "$local_name" "${host}:$rmt_name_q"`). – Charles Duffy Jun 16 '17 at 19:39
  • @delavnog Do you have a problem with my answer? If you `shlex.quote` the paths (both local and remote) you shouldn't have a problem. – Paulo Almeida Jun 16 '17 at 19:39
  • @PauloAlmeida, quoting the local path is incorrect (if you're passing an explicit array to the local command rather than having both local and remote expansion phases); only the remote name is subject to an `eval` pass. – Charles Duffy Jun 16 '17 at 19:40
  • @CharlesDuffy re native bash: if rmt_name has spaces (but no other strange characters, like maybe quotes), putting it in quotes is enough: `scp "$local_name" "${host}:\"${rmt_name}\""`. Your solution is more robust, but then you also need to do the same for `$local_name`? – blueFast Jun 16 '17 at 19:46
  • @CharlesDuffy Ok, so if the local file has spaces it won't matter? I didn't know, I just included it as a guess, and noticed it wouldn't harm anything either. – Paulo Almeida Jun 16 '17 at 19:47
  • @PauloAlmeida, incorrect -- it *does* harm things. `touch 'foo bar'; scp 'foo\ bar' host:` fails. – Charles Duffy Jun 16 '17 at 20:00
  • @delavnog, not only *don't need to* do the same thing for the local name, but *can't* without causing breakage. A local name is treated as a literal filename, not unescaped after it reaches `scp`. Thus, the only case where you need to escape the local name is if there's a shell parsing it before `scp` is executed, and if that's happening, you're Doing It Wrong. – Charles Duffy Jun 16 '17 at 20:02
  • ...now, one might need to fully-qualify a local name to make it unambiguous -- for a file named `host:`, you'd want to refer to it as `./host:` or `/full/path/to/host:` to prevent scp from treating it as a remote name, but that's a somewhat different matter. – Charles Duffy Jun 16 '17 at 20:04
  • @CharlesDuffy Ok, you're right of course, I only tested with the example in the question, which doesn't have spaces, but that would be pointless. Quoting breaks precisely the files that would make me want to quote them in the first place :) – Paulo Almeida Jun 16 '17 at 20:17
  • @CharlesDuffy the python subprocess module can indeed use the shell to run the command, but it is by default disabled: https://docs.python.org/3/library/subprocess.html#frequently-used-arguments – blueFast Jun 16 '17 at 20:39
  • @delavnog, I'm well aware. That said, using `shlex.split()` is encouraging near-workalike behavior to `shell=True` -- not all the vulnerabilities, but *some* of them, and for no good reason. – Charles Duffy Jun 16 '17 at 20:41
  • (ex., re: "some of them" -- someone creating a directory with `mkdir -p './ /etc/passwd '` and then passing the name of a file in that directory to a command shouldn't be able to cause `/etc/passwd` to be treated as a distinct argument *under any circumstances*; with `shlex.split()`, one needs to be using `shlex.quote()` to prevent that up-front, whereas if you're passing arguments as distinct array elements, there's nothing to prevent). – Charles Duffy Jun 16 '17 at 20:44
  • @CharlesDuffy that's a convoluted, but illustrative and serious, example – blueFast Jun 16 '17 at 20:46
  • Plus one. A question tagged with SCP that actually involves code! – jww Jun 25 '17 at 05:03

1 Answers1

4
command = split_it('scp {} {}:"{}"'.format(localfile, host, mypath))
  1. Instead of building a command string, only to split_it again, directly build a list of arguments.

  2. In order to add one layer of quoting to the remote file path, use shlex.quote (or pipes.quote if using older Python versions).

command = ['scp', localfile, '{}:{}'.format(host, shlex.quote(mypath))]

Sources/related posts:

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • This doesnt appear to work on windows, it adds a single quote when I need a double quote to allow windows to intepret spaces etc. If I hard code a double quote then Popen covnerts it to \" and my scripts get confused – Jim Feb 12 '20 at 22:17
  • Sounds like you should ask a new question with a [minimal example reproducing your problem](https://stackoverflow.com/help/minimal-reproducible-example). – mkrieger1 Feb 12 '20 at 22:19