0

Yesterday I asked for help putting together a Python script to loop through folders, check the contents of each, and print out a report with some basic stats on the files in these folders. Martin Prikryl pointed me in the direction of some code he developed a couple months back. I tried it and didn't get any errors, but didn't get any results either. Here is the code.

from ftplib import *
global ftp
import ftplib
import io
from io import StringIO
import string
import pandas as pd
from pandas.compat import StringIO
from collections import Counter

from ssl import SSLSocket

class FtpFile:

    def __init__(self, ftp, name):
        self.ftp = ftp
        self.name = name
        self.size = ftp.size(name)
        self.pos = 0

    def seek(self, offset, whence):
        if whence == 0:
            self.pos = offset
        if whence == 1:
            self.pos += offset
        if whence == 2:
            self.pos = self.size + offset
        print("seek {}".format(self.pos))

    def tell(self):
        print("tell {}".format(self.pos))
        return self.pos

    def read(self, size = None):
        if size == None:
            size = self.size - self.pos
        print("read {}".format(size))
        data = ""

        # based on FTP.retrbinary 
        # (but allows stopping after certain number of bytes read)
        ftp.voidcmd('TYPE I')
        cmd = "RETR {}".format(self.name)
        conn = ftp.transfercmd(cmd, self.pos)
        try:
            while len(data) < size:
                buf = conn.recv(min(size - len(data), 8192))
                if not buf:
                    break
                data += buf
            # shutdown ssl layer (can be removed if not using TLS/SSL)
            if SSLSocket is not None and isinstance(conn, SSLSocket):
                conn.unwrap()
        finally:
            conn.close()
        ftp.voidresp()
        print("read {}".format(len(data)))
        return data

# And then you can use it like:
ftp = FTP(portal, user_name, password)
ftp.cwd('/emm/') # folder that I'm trying to query

zipstring = StringIO()
print(zipstring)
name = "C:/Users/ryans/OneDrive/Desktop/archive.zip"
print(name)
size = ftp.size(name)
print(size)
ftp.retrbinary("RETR " + name, zipstring.write, rest = size - 1000*2024)

zip = zipfile.ZipFile(zipstring)

print(zip.namelist())

I would expect the results to get printed out somewhere, either in a text file or a CSV file, but I don't see anything printed out. Also, the code runs very, very slow, and never actually finishes. Again, I don't see a any results anywhere. The FTP portal that I'm looking at is around 7.6GB and it has 705 folders and files. I would like the get file names, dates when files were added/changed, size of each file, and if possible, record count in each file. Maybe the last thing is too hard to do. I would think the other things are doable.

ASH
  • 20,759
  • 19
  • 87
  • 200
  • You are trying to retrieve a file named `C:/Users/ryans/OneDrive/Desktop/archive.zip` from this FTP server. Is it really what you want? Also, whatever its purpose, the class FtpFile is never instantiated. You are importing twice StringIO from different modules. You are declaring `global ftp` and importing `pandas` for apparently no reason. –  Dec 04 '18 at 14:43
  • There are probably a couple superfluous libraries being imported, yes. That's not my question though. – ASH Dec 04 '18 at 14:59
  • Ok, let's forget the code then. You want to get a list of the filenames on your server (possibly with metadata such as date or size), not the file *contents*, right? –  Dec 04 '18 at 15:06
  • Yes, that's exactly right. – ASH Dec 04 '18 at 15:10
  • Then 1) your code is not even remotely attempting to do that and has several severe bugs that you don't seem to want to address. 2) You could use the NLST command to get a file/folder list (but you don't know which is which), SIZE to get size [MLSD](https://tools.ietf.org/html/rfc3659) to get more information (if supported by your server), and if it's not you will have to parse the output of DIR (not standardized AFAIK). Python FTP class has methods for those FTP commands. But without knowing what your server supports/prints, no way to help more. –  Dec 04 '18 at 15:38
  • Your previous question was about ["checking the contents of **zipped folders in an FTP portal**"](https://stackoverflow.com/q/53599630/850848) - That's why I've pointed you to my answer to [Get files names inside a zip file on FTP server without downloading whole archive](https://stackoverflow.com/q/53143518/850848) - Your current question **does not mention *"ZIP"* at all**. Hence it's clear why is @Jean-ClaudeArbaut confused by your question. – Martin Prikryl Dec 05 '18 at 07:02

1 Answers1

0

I put together some code that seems to work pretty well. I'm sure this can be improved, but for now, it's good enough.

import ftplib
import datetime 
from datetime import datetime 

ftp = ftplib.FTP('ftp_portal', 'user_name', 'password')  

ftp.cwd('folder_of_interest')
ftp.retrlines('LIST')  

filenames = []  
ftp.retrlines('NLST', filenames.append)  

# writes file name and modified date and file size.
with open('C:\\path_to_file\\test.txt', 'w')  as f:
    for filename in filenames:  
        datetimeftp = ftp.sendcmd('MDTM ' + filename)
        modifiedTimeFtp = datetime.strptime(datetimeftp[4:], "%Y%m%d%H%M%S").strftime("%d %b %Y %H:%M:%S")
        size = ftp.size(filename)
        filesize = "{:.2f}".format(size/(1024))
        f.write(filename)
        f.write(':')
        f.write(modifiedTimeFtp)
        f.write(':')
        f.write(filesize + ' KB')
        f.write('\n')
f.close()
ASH
  • 20,759
  • 19
  • 87
  • 200