1

I've created a function which download .gz files from given ftp server and I want to extract them on the fly while downloading and delete compressed files afterwards. How can I do that?

sinex_domain = "ftp://cddis.gsfc.nasa.gov/gnss/products/bias/2013"

def download(sinex_domain):
    user = getpass.getuser()
    sinex_parse = urlparse(sinex_domain)

    sinex_connetion = FTP(sinex_parse.netloc)
    sinex_connetion.login()
    sinex_connetion.cwd(sinex_parse.path)
    sinex_files = sinex_connetion.nlst()
    sinex_userpath = "C:\\Users\\" + user + "\\DCBviz\\sinex"
    pathlib.Path(sinex_userpath).mkdir(parents=True, exist_ok=True)

    for fileName in sinex_files:
        local_filename = os.path.join(sinex_userpath, fileName)
        file = open(local_filename, 'wb')
        sinex_connetion.retrbinary('RETR '+ fileName, file.write, 1024)
        
        #want to extract files in this loop

        file.close()

    sinex_connetion.quit()

download(sinex_domain)
alani
  • 12,573
  • 2
  • 13
  • 23

1 Answers1

1

Although there is probably a cleverer way that avoids storing the whole data in memory for each file, these appear to be quite small files (a few tens of kilobytes uncompressed), so it would be sufficient to read the compressed data into a BytesIO buffer, then decompress it in memory before writing it to the output file. (The compressed data is never saved to disk.)

You would add these imports:

import gzip
from io import BytesIO

and then your main loop becomes:

    for fileName in sinex_files:
        local_filename = os.path.join(sinex_userpath, fileName)
        if local_filename.endswith('.gz'):
            local_filename = local_filename[:-3]
        data = BytesIO()
        sinex_connetion.retrbinary('RETR '+ fileName, data.write, 1024)
        data.seek(0)
        uncompressed = gzip.decompress(data.read())
        with open(local_filename, 'wb') as file:
            file.write(uncompressed)

(Note that the file.close() is not needed.)

alani
  • 12,573
  • 2
  • 13
  • 23
  • I have one more question @alaniwi. Maybe you'll now the answer. Is there a way increase download speed? – Mateusz Piskorski Aug 19 '20 at 19:47
  • @MateuszPiskorski You could do some downloads in parallel (within reason - the ftp server admins probably will not thank you if the number in parallel is too large). But I think that if you need detailed help with that then you will need to ask a separate question. – alani Aug 19 '20 at 19:50