1

I have a CSV file on an FTP server. The file is around 200mb.

For now, I am reading the file using the following method, the issue with this implementation is that the file takes too long to download, the retrbinary method takes around 12min to execute. I tried with different block sizes, I was able to get the time to 11 min which is still too much.

download_file = io.BytesIO()
ftp.retrbinary("RETR {}".format(file_path),download_file.write, 8024)
download_file.seek(0)
dataframe = pandas.read_csv(download_file, nrows=4)

I need help reading the file in chunks, I only need the first 4 rows of the file.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992

1 Answers1

1

To read the first 4 lines of a remote file only, use:

download_file = io.BytesIO()

ftp.sendcmd('TYPE A')
conn = ftp.transfercmd("RETR {}".format(file_path))
fp = conn.makefile('rb')
count = 0
while count < 4:
    line = fp.readline(ftp.maxline + 1)
    if not line:
        break
    download_file.write(line)
    count += 1
fp.close()
conn.close()

Had you really wanted to process the whole file in chunks, it would be way more complicated, given the API of ftplib and Pandas. But it is possible. For some ideas, see: Get files names inside a zip file on FTP server without downloading whole archive.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992