0

Hello I have got problem I want to get all data from the web but this is too huge to save it to variable. I save data making it like this:

r = urlopen("http://download.cathdb.info/cath/releases/all-releases/v4_2_0/cath-classification-data/cath-domain-list-v4_2_0.txt")
r = BeautifulSoup(r, "lxml")
r = r.p.get_text()
some operations

This was working good until I have to get data from this website: http://download.cathdb.info/cath/releases/all-releases/v4_2_0/cath-classification-data/cath-domain-description-file-v4_2_0.txt

When I run same code as above on this page my program is stopping at line

r = BeautifulSoup(r, "lxml")

and this is taking forever, nothing happen. I don't know how to get this whole data not saving it to file to make on this some operations of searching key words and printing them. I can't save this to file I have to get this from website.

I will be very thankful for every help.

Tai
  • 7,684
  • 3
  • 29
  • 49
Ppyyt
  • 125
  • 2
  • 9

1 Answers1

1

I think the code below can do what you want. Like mentioned in a comment by @alecxe, you don't need to use BeautifulSoup. This problem should be a problem to retrieve content from text files online and is answered in this Given a URL to a text file, what is the simplest way to read the contents of the text file?

from urllib.request import urlopen

r = urlopen("http://download.cathdb.info/cath/releases/all-releases/v4_2_0/cath-classification-data/cath-domain-list-v4_2_0.txt")
                                 
for line in r:
    do_somthing()
chluebi
  • 1,809
  • 1
  • 8
  • 21
Tai
  • 7,684
  • 3
  • 29
  • 49