5

I am using paramiko to open a remote sftp file in python. With the file object returned by paramiko, I am reading the file line by line and processing the information. This seems really slow compared to using the python in-built method 'open' from the os. Following is the code I am using to get the file object.

Using paramiko (slower by 2 times) -

client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(myHost,myPort,myUser,myPassword)
sftp = client.open_sftp()
fileObject = sftp.file(fullFilePath,'rb')

Using os -

import os
fileObject = open(fullFilePath,'rb')

Am I missing anything? Is there a way to make the paramiko fileobject read method as fast as the one using the os fileobject?

Thanks!!

Rinks
  • 1,007
  • 2
  • 16
  • 22
  • Oh, I should mention that you don't need the 'b' in the 'rb' in your `sftp.file` call. From the paramiko docs: "The python 'b' flag is ignored, since SSH treats all files as binary." – John Lyon Sep 27 '11 at 03:16

3 Answers3

7

Your problem is likely to be caused by the file being a remote object. You've opened it on the server and are requesting one line at a time - because it's not local, each request takes much longer than if the file was sitting on your hard drive. The best alternative is probably to copy the file down to a local location first, using Paramiko's SFTP get.

Once you've done that, you can open the file from the local location using os.open.

John Lyon
  • 11,180
  • 4
  • 36
  • 44
  • Thanks for your reply @jozzas. Can I make use of the buffersize or something to make it better? Getting the file locally might not be an option due to security reasons. – Rinks Sep 27 '11 at 02:30
  • @Rinks you could certainly experiment with the buffer size - try making it quite large (50000?), and then experiment with this with your current method and also by calling readlines() to grab all lines in the file at once instead of requesting one line at a time. This might take a little while but once it's done, looping through the lines should be faster. You'll need to experiment to see what works best. – John Lyon Sep 27 '11 at 03:14
  • Thanks again @jozzas. I tried with the buffer size of 50000 and it is better than the default. But still, much worse compared to the local. Might have to try for local somehow. – Rinks Sep 27 '11 at 20:33
5

I was having the same issue and I could not afford to copy the file locally because of security reasons, I solved it by using a combination of prefetching and bytesIO:

def fetch_file_as_bytesIO(sftp, path):
    """
    Using the sftp client it retrieves the file on the given path by using pre fetching.
    :param sftp: the sftp client
    :param path: path of the file to retrieve
    :return: bytesIO with the file content
    """
    with sftp.file(path, mode='rb') as file:
        file_size = file.stat().st_size
        file.prefetch(file_size)
        file.set_pipelined()
        return io.BytesIO(file.read(file_size))
arocketman
  • 1,134
  • 12
  • 21
3

Here is a way that works using scraping the command line (cat) in paramiko, and reading all lines at once. Works well for me:

import paramiko

client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.WarningPolicy())
client.connect(hostname=host, port=port, username=user, key_filename=ssh_file)             

stdin, stdout, stderr = client.exec_command('cat /proc/net/dev')
net_dump = stdout.readlines()
#your entire file is now in net_dump .. do as you wish with it below ...
client.close()

The files I open are quite small so it all depends on your file size. Worth a try :)

radtek
  • 34,210
  • 11
  • 144
  • 111