49

I am trying to read a file from a server using SSH from Python. I am using Paramiko to connect. I can connect to the server and run a command like cat filename and get the data back from the server but some files I am trying to read are around 1 GB or more in size.

How can I read the file on the server line by line using Python?

Additional Info: What is regularly do is run a cat filename command and store the result in a variable and work off that. But since the file here is quite big, I am looking for a way to read a file line by line off the server.

EDIT: I can read a bunch of data and split it into lines but the problem is that the data received in the buffer does not always include the complete lines. For eg, if buffer has 300 lines, the last line may only be half of the line on the server and the next half would be fetched in the next call to the server. I want complete lines

EDIT 2: what command can I use to print lines in a file in a certain range. Like print first 100 lines, then the next 100 and so on? This way the buffer will always contain complete lines.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
randomThought
  • 6,203
  • 15
  • 56
  • 72

6 Answers6

90

Paramiko's SFTPClient class allows you to get a file-like object to read data from a remote file in a Pythonic way.

Assuming you have an open SSHClient:

sftp_client = ssh_client.open_sftp()
remote_file = sftp_client.open('remote_filename')
try:
    for line in remote_file:
        # process line
finally:
    remote_file.close()
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
Matt Good
  • 3,027
  • 22
  • 15
  • 5
    While correct, this naive implementation is very slow. It needs some improvements for a good performance. See [Reading file opened with Python Paramiko SFTPClient.open method is slow](https://stackoverflow.com/q/58433996/850848). – Martin Prikryl Oct 22 '20 at 16:38
18

Here's an extension to @Matt Good's answer, using fabric:

from fabric.connection import Connection

with Connection(host, user) as c, c.sftp() as sftp,   \
         sftp.open('remote_filename') as file:
    for line in file:
        process(line)

old Fabric 1 answer:

from contextlib     import closing
from fabric.network import connect

with closing(connect(user, host, port)) as ssh, \
     closing(ssh.open_sftp()) as sftp, \
     closing(sftp.open('remote_filename')) as file:
    for line in file:
        process(line)
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • I've never seen contextlib.closing before. So this lets you turn anything with a close() method into a Context Manager-like thing, notwithstanding that it may not have \_\_enter\_\_ and \_\_exit\_\_? – hughdbrown Oct 21 '09 at 04:14
  • @hughbrown: Yes. Any object with `.close()` method will do. The implementation of `closing` is trivial, see http://svn.python.org/view/python/trunk/Lib/contextlib.py?view=markup – jfs Oct 21 '09 at 20:26
  • In fact `with sftp.open('remote_filename') as f:` would also work – taras Oct 13 '15 at 20:54
  • @user128285: it may depend on specific libraries versions (newer versions make the `closing()` calls unnecessary). – jfs Oct 13 '15 at 20:59
  • ModuleNotFoundError: No module named 'fabric.network', is it version dependent? – Allan Ruin Oct 22 '20 at 03:13
  • 1
    @AllanRuin: yes, fabric 2 needs different code. I've updated the answer. – jfs Oct 22 '20 at 15:39
  • Is there any performance improvement using Fabric instead of Paramiko? – Tharindu Sathischandra Dec 21 '22 at 09:53
  • 1
    @TharinduSathischandra fabric is built on top of paramiko. Here it is used just as a convenience wrapper. – jfs Dec 21 '22 at 10:30
8
#!/usr/bin/env python
import paramiko
import select
client = paramiko.SSHClient()
client.load_system_host_keys()
client.connect('yourhost.com')
transport = client.get_transport()
channel = transport.open_session()
channel.exec_command("cat /path/to/your/file")
while True:
  rl, wl, xl = select.select([channel],[],[],0.0)
  if len(rl) > 0:
      # Must be stdout
      print channel.recv(1024)
  • Good example of paramiko, but again highlights the non-line-oriented nature of this kind of task. – Joe Koberg Oct 20 '09 at 20:20
  • Just keep reading it until you get a newline or other line-terminating character. –  Oct 20 '09 at 21:57
6

It looks like back in Sept 2013 paramiko added the ability for these objects to support context managers natively, so if you want both Matt's clean answer with jfs's context manager, now all you need is:

with ssh_client.open_sftp() as sftp_client:
    with sftp_client.open('remote_filename') as remote_file:
        for line in remote_file:
            # process line
sql_knievel
  • 1,199
  • 1
  • 13
  • 26
4

What do you mean by "line by line" - there are lots of data buffers between network hosts, and none of them are line-oriented.

So you can read a bunch of data, then split it into lines at the near end.

ssh otherhost cat somefile | python process_standard_input.py | do_process_locally

Or you can have a process read a bunch of data at the far end, break it up, and format it line by line and send it to you.

scp process_standard_input.py otherhost
ssh otherhost python process_standard_input.py somefile |  do_process_locally

The only difference I would care about is what way reduces the volume of data over a limited network pipe. In your situation it may, or may not matter.

There is nothing wrong in general with using cat over an SSH pipe to move gigabytes of data.

Joe Koberg
  • 25,416
  • 6
  • 48
  • 54
-2

I lost almost half a day of work trying to use paramiko and fabric to do this. But thanks to this answer I was able to come up with the following answer:

from ftplib import FTP_TLS

source = '/file/path/in/FTP/server.txt'
destiny = '/file/path/in/local/machine.txt'

with FTP_TLS() as ftps:
  ftps.connect(host, port)
  ftps.sendcmd(f'USER { username }')
  ftps.sendcmd(f'PASS { password }')

  with ftps as conn:
    with open(destiny, 'wb') as file:
      conn.retrbinary(f'RETR { source }', file.write)

  • Well, that's FTPS. So no wonder that Paramiko did not work for you. Paramiko is SSH/SFTP client. That's a completely different protocol. The FTPS has nothing to do with this question. – Martin Prikryl Mar 23 '22 at 09:03