Fixing FTP webscraping script on Python 3.5

Question

I want to extract a text file from a FTP server. This is the code I already have:

from ftplib import FTP
import re

def my_function(data):
    print(data)

ftp = FTP('ftp.nasdaqtrader.com')
ftp.login()
nasdaq=ftp.retrbinary('RETR /SymbolDirectory/nasdaqlisted.txt', my_function)
#nasdaq contains the text file

I've had a couple of problems with this approach. For instance, every time I run the script everything prints out which I really don't want, I just need the variable "nasdaq" to be stored as a string. Also, even though "nasdaq" prints out this line:

b'Symbol|Security Name|Market Category|Test Issue|Financial Status|Round Lot Size|ETF|NextShares\r\nAAAP|Advanced Accelerator Applications S.A. - American Depositary Shares

I can't prove it to be in "nasdaq":

print ("\r\nAAAP|Advanced Accelerator Applications S.A." in nasdaq)
Out: False

What would be a more pythonic approach?

you can't `print ("\r\nAAAP|Advanced Accelerator Applications S.A." in nasdaq)` cause it would raise a TypeError because 'str' does not support the buffer interface` — Amin Etesamian, Jan 16 '17 at 20:09

score 2 · Accepted Answer · edited May 23 '17 at 12:16

This is essentially a duplicate of Is it possible to read FTP files without writing them using Python? but I wanted to show how to implement it specifically to your case.

from ftplib import FTP
from io import BytesIO

data = BytesIO()
with FTP("ftp.nasdaqtrader.com") as ftp: # use context manager to avoid
    ftp.login()                          # leaving connection open by mistake
    ftp.retrbinary("RETR /SymbolDirectory/nasdaqlisted.txt", data.write)
data.seek(0) # need to go back to the beginning to get content
nasdaq = data.read().decode() # convert bytes back to string

nasdaq should now be a string containing the contents of the indicated file, with \r\n Windows-style line endings. If you .split() on those two characters, you'll get a list with each line as a component.

Fixing FTP webscraping script on Python 3.5

1 Answers1