3

I want to download multiple files from FTP in python. the my code works when I just download 1 file, but not works for more than one!

import urllib
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC1790863.tar.gz', 'file1.tar.gz')
urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC2329613.tar.gz', 'file2.tar.gz')

An error say:

Traceback (most recent call last):
  File "/home/ehsan/dev_center/bigADEVS-bknd/daemons/crawler/ftp_oa_crawler.py", line 3, in <module>
    urllib.urlretrieve('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC2329613.tar.gz', 'file2.tar.gz')
  File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
  File "/usr/lib/python2.7/urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 558, in open_ftp
    (fp, retrlen) = self.ftpcache[key].retrfile(file, type)
  File "/usr/lib/python2.7/urllib.py", line 906, in retrfile
    conn, retrlen = self.ftp.ntransfercmd(cmd)
  File "/usr/lib/python2.7/ftplib.py", line 334, in ntransfercmd
    host, port = self.makepasv()
  File "/usr/lib/python2.7/ftplib.py", line 312, in makepasv
    host, port = parse227(self.sendcmd('PASV'))
  File "/usr/lib/python2.7/ftplib.py", line 830, in parse227
    raise error_reply, resp
IOError: [Errno ftp error] 200 Type set to I

What should I do?

ehsan shirzadi
  • 4,709
  • 16
  • 69
  • 112

1 Answers1

5

It is a bug in urllib in python 2.7. Reported here. The reason behind the same is explained here

Now, when a user tries to download the same file or another file from same directory, the key (host, port, dirs) remains the same so open_ftp() skips ftp initialization. Because of this skipping, previous FTP connection is reused and when new commands are sent to the server, server first sends the previous ACK. This causes a domino effect and each response gets delayed by one and we get an exception from parse227()

A possible solution is to clear the cache that may have been built up by previous calls. You may use the urllib.urlcleanup() method calls between your urlretrieve calls for the same, as mentioned here.

Hope this helps!

Rudresh Panchal
  • 980
  • 4
  • 16