Read text files from website with Python

Question

Hello I have got problem I want to get all data from the web but this is too huge to save it to variable. I save data making it like this:

r = urlopen("http://download.cathdb.info/cath/releases/all-releases/v4_2_0/cath-classification-data/cath-domain-list-v4_2_0.txt")
r = BeautifulSoup(r, "lxml")
r = r.p.get_text()
some operations

This was working good until I have to get data from this website: http://download.cathdb.info/cath/releases/all-releases/v4_2_0/cath-classification-data/cath-domain-description-file-v4_2_0.txt

When I run same code as above on this page my program is stopping at line

r = BeautifulSoup(r, "lxml")

and this is taking forever, nothing happen. I don't know how to get this whole data not saving it to file to make on this some operations of searching key words and printing them. I can't save this to file I have to get this from website.

I will be very thankful for every help.

You mean I should use .read() function with encoding on response object? Becouse this is taking forever too. — Ppyyt, Dec 16 '17 at 04:06
Well, you can at least try to download and process the file in chunks: https://stackoverflow.com/a/16696317/771848. — alecxe, Dec 16 '17 at 04:08
Possible duplicate of [In Python, given a URL to a text file, what is the simplest way to read the contents of the text file?](https://stackoverflow.com/questions/1393324/in-python-given-a-url-to-a-text-file-what-is-the-simplest-way-to-read-the-cont) — Tai, Dec 16 '17 at 04:17
That file is enormous. I'd expect it take a looooooong while. — ImprobabilityCast, Dec 16 '17 at 04:26
Correction: Not a big file, I'd guess it's a sloooow server. — ImprobabilityCast, Dec 16 '17 at 04:31

score 1 · Accepted Answer · edited Mar 28 '21 at 18:21

I think the code below can do what you want. Like mentioned in a comment by @alecxe, you don't need to use BeautifulSoup. This problem should be a problem to retrieve content from text files online and is answered in this Given a URL to a text file, what is the simplest way to read the contents of the text file?

from urllib.request import urlopen

r = urlopen("http://download.cathdb.info/cath/releases/all-releases/v4_2_0/cath-classification-data/cath-domain-list-v4_2_0.txt")
                                 
for line in r:
    do_somthing()

Read text files from website with Python

1 Answers1