18

On Machine1, I have a Python2.7 script that computes a big (up to 10MB) binary string in RAM that I'd like to write to a disk file on Machine2, which is a remote machine. What is the best way to do this?

Constraints:

  • Both machines are Ubuntu 13.04. The connection between them is fast -- they are on the same network.

  • The destination directory might not yet exist on Machine2, so it might need to be created.

  • If it's easy, I would like to avoid writing the string from RAM to a temporary disk file on Machine1. Does that eliminate solutions that might use a system call to rsync?

  • Because the string is binary, it might contain bytes that could be interpreted as a newline. This would seem to rule out solutions that might use a system call to the echo command on Machine2.

  • I would like this to be as lightweight on Machine2 as possible. Thus, I would like to avoid running services like ftp on Machine2 or engage in other configuration activities there. Plus, I don't understand security that well, and so would like to avoid opening additional ports unless truly necessary.

  • I have ssh keys set up on Machine1 and Machine2, and would like to use them for authentication.

  • EDIT: Machine1 is running multiple threads, and so it is possible that more than one thread could attempt to write to the same file on Machine2 at overlapping times. I do not mind the inefficiency caused by having the file written twice (or more) in this case, but the resulting datafile on Machine2 should not be corrupted by simultaneous writes. Maybe an OS lock on Machine2 is needed?

I'm rooting for an rsync solution, since it is a self-contained entity that I understand reasonably well, and requires no configuration on Machine2.

martineau
  • 119,623
  • 25
  • 170
  • 301
Iron Pillow
  • 2,152
  • 4
  • 20
  • 29
  • you can take a look at python sockets (tcp sockets in your case). What ever scheme you need can be implemented with them. – Louis Hugues Oct 05 '13 at 20:28
  • sftp seems like a likely candidate. https://wiki.python.org/moin/SecureShell http://stackoverflow.com/questions/432385/sftp-in-python-platform-independent – Robᵩ Oct 05 '13 at 20:28
  • How long would it take to transfer these 10 MB to the other side? Are broken connections and resuming likely? These questions might be relevant to decide if [Erik Allik's solution](http://stackoverflow.com/a/19202567/296974) - which would be my favourite as well - is usable here. – glglgl Oct 05 '13 at 21:00
  • @SioulSeuguh Not without opening an additional port - which seems to be unwanted here. SSH connection would probably be better... – glglgl Oct 05 '13 at 21:01
  • Edited the question to state that the connection between the machines is fast. – Iron Pillow Oct 05 '13 at 21:05

6 Answers6

22

Paramiko supports opening files on remote machines:

import paramiko

def put_file(machinename, username, dirname, filename, data):
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(machinename, username=username)
    sftp = ssh.open_sftp()
    try:
        sftp.mkdir(dirname)
    except IOError:
        pass
    f = sftp.open(dirname + '/' + filename, 'w')
    f.write(data)
    f.close()
    ssh.close()


data = 'This is arbitrary data\n'.encode('ascii')
put_file('v13', 'rob', '/tmp/dir', 'file.bin', data)
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • +1, great solution to begin with (actually, it doesn't account for deeper paths (`/a/b/c/d` or so), if even `b` or `c` don't exist yet...). – glglgl Oct 05 '13 at 21:27
  • @glglgl - agreed, but I probably won't fix it. – Robᵩ Oct 05 '13 at 22:34
  • @Robᵩ No, that's up to whoever needs it. – glglgl Oct 06 '13 at 12:45
  • Are the data ASCII encoded because `f.write(data)` requires ASCII data (seems hard to believe) or because it's just good form to specify encoding, even on an example string? – Iron Pillow Oct 06 '13 at 21:44
7

You open a new SSH process to Machine2 using subprocess.Popen and then you write your data to its STDIN.

import subprocess

cmd = ['ssh', 'user@machine2',
       'mkdir -p output/dir; cat - > output/dir/file.dat']

p = subprocess.Popen(cmd, stdin=subprocess.PIPE)

your_inmem_data = 'foobarbaz\0' * 1024 * 1024

for chunk_ix in range(0, len(your_inmem_data), 1024):
    chunk = your_inmem_data[chunk_ix:chunk_ix + 1024]
    p.stdin.write(chunk)

I've just verified that it works as advertised and copies all of the 10485760 dummy bytes.

P.S. A potentially cleaner/more elegant solution would be to have the Python program write its output to sys.stdout instead and do the piping to ssh externally:

$ python process.py | ssh <the same ssh command>
Erik Kaplun
  • 37,128
  • 15
  • 99
  • 111
  • This looks very good, but is there a typo involving quotation marks in the second line? – Iron Pillow Oct 05 '13 at 20:57
  • Why `shell=True`? A mere `ssh_cmd_list = ['ssh', 'user@machine2', 'mkdir -p output/dir; cat - > output/dir/file.dat']` followed by a `p = subprocess.Popen(ssh_cmd_list, stdin=subprocess.PIPE)` makes the stuff much easier to read and removes a layer of complexity, what the additional shell layer would be. – glglgl Oct 05 '13 at 20:58
  • @glglgl: then you need the full path to `ssh` I'm afraid; but anyway... I can't find a typo. Basically I'm just providing sufficiently perfected code that works; the OP is free to amend, adapt, transform and clean up :) – Erik Kaplun Oct 05 '13 at 21:03
  • Aha. I was unfamiliar with what appears to be a ('foo' 'bar') concatenation syntax. – Iron Pillow Oct 05 '13 at 21:11
  • @IronPillow: actually it's just that the syntax of string literals is such that two or more adjacent literals are treated as one so `'foo''bar'` is `'foobar'` and so is `('foo' 'bar'`), and the linebreak is simply ignored and the parens are needed to avoid a \ line continuation :) – Erik Kaplun Oct 05 '13 at 21:12
  • 1
    Just to clear up one misunderstanding: even in `shell=False` mode, you don't have to provide the full path of the executable - `Popen()` finds it for you. (See [here](http://docs.python.org/2/library/subprocess.html#subprocess.call) how `subprocess.call(["ls", "-l"])` is working code, and see [here](http://docs.python.org/2/library/subprocess.html#replacing-older-functions-with-the-subprocess-module) for other examples.) – glglgl Oct 05 '13 at 21:30
  • @glglgl: that's good to keep in mind; I've updated the answer accordingly! – Erik Kaplun Oct 05 '13 at 21:36
  • @IronPillow: just wondering: are you still looking for better solutions? – Erik Kaplun Oct 05 '13 at 22:16
  • Two questions about this method: (1) What reasons, if any, would weight against increasing the block size to 10MB? (2) Machine1 is running multiple threads, and it is possible that two threads will try to write the same file to Machine2 at overlapping times -- what happens then? I guess I should update the question to reflect this additional constraint. – Iron Pillow Oct 06 '13 at 21:09
  • (1) just try and see? (2) you'd then need to implement some sort of file locking, and this is not possible over an SSH connection—something on the other side must handle that; but then you just use that instead of `cat` so the transfer itself is the same; for file locking, see the pylockfile package. – Erik Kaplun Oct 06 '13 at 22:42
  • ...@IronPillow or if you just want to keep the last file, you can write to a temporary file and then `mv` it in place. – Erik Kaplun Oct 06 '13 at 22:42
  • `mv` is a great idea! – Iron Pillow Oct 06 '13 at 23:29
3

A bit modification to @Erik Kaplun answer, the below code worked for me. (using communicate() rather than .stdin.write)

import subprocess
# convert data to compatible format
cmd = ['ssh', 'user@machine2', 'cat - > /path/filename']
p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
p.communicate(data)
Enes R. A.
  • 31
  • 3
  • 1
    Concise, nice. Might be including a mkdir, or mention of the pitfall. Save someone some tears – mcint Jul 01 '21 at 10:18
2

We can write string to remote file in three simple steps:

  1. Write string to a temp file
  2. Copy temp file to remote host
  3. Remove temp file

Here is my code (without any third parties)

import os

content = 'sample text'
remote_host = 'your-remote-host'
remote_file = 'remote_file.txt'

# step 1
tmp_file = 'tmp_file.txt'
open(tmp_file, 'w').write(content)

# step 2
command = 'scp %s %s:%s' % (tmp_file, remote_host, remote_file)
os.system(command)

# step 3
os.remove(tmp_file)
Vu Anh
  • 955
  • 1
  • 18
  • 29
0

If just calling a subprocess is all you want, maybe sh.py could be the right thing.

from sh import ssh
remote_host = ssh.bake(<remote host>) 
remote_host.dd(_in = <your binary string>, of=<output filename on remote host>) 
LeuX
  • 148
  • 5
0

A solution in which you don't explicitly send your data over some connection would be to use sshfs. You can use it to mount a directory from Machine2 somewhere on Machine1 and writing to a file in that directory will automatically result in the data being written to Machine2.

brm
  • 3,706
  • 1
  • 14
  • 14
  • This is clever and elegant, but it's not clear what happens if Machine1 reboots. I did not study the docs, but it appears that the connection would be lost, and would need to be re-established manually. – Iron Pillow Oct 06 '13 at 20:57
  • Actually, if either Machine1 or Machine2 reboots it could be a problem. – Iron Pillow Oct 06 '13 at 21:11
  • @IronPillow Maybe `-o reconnect` would help? – tshepang Oct 06 '13 at 21:22
  • @IronPillow perhaps doing the mount and umount of Machine2 from your Python script could help somewhat – brm Oct 07 '13 at 14:38