EDIT
The old code couldn't pickle the soup
object due to RecursionError
:
Traceback (most recent call last):
File "soup.py", line 20, in <module>
pickle.dump(soup, f)
RecursionError: maximum recursion depth exceeded while calling a Python object
The solution is to increase the recursion limit. They do the same in this answer, which in turn, references the docs.
HOWEVER, the particular site you're trying to load and save is extremely nested. My computer can't get past a recursion of limit of 50000 and it's not enough for your site and crashes: 10008 segmentation fault (core dumped) python soup.py
.
So, if you need to download the HTML and use it later you can do this:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = "https://coinmarketcap.com/all/views/all/"
html = urlopen(url)
# Save HTML to a file
with open("soup.html", "wb") as f:
while True:
chunk = html.read(1024)
if not chunk:
break
f.write(chunk)
Then you can read the HTML file you saved and instantiate the bs4 object with it:
# Read HTML from a file
with open("soup.html", "rb") as f:
soup = BeautifulSoup(f.read(), "lxml")
print(soup.title)
# <title>All Cryptocurrencies | CoinMarketCap</title>
Additionally, this is the code I would use for a less nested site:
import pickle
from bs4 import BeautifulSoup
from urllib.request import urlopen
import sys
url = "https://stackoverflow.com/questions/52973700/how-to-save-the-beautifulsoup-object-to-a-file-and-then-read-from-it-as-beautifu"
html = urlopen(url)
soup = BeautifulSoup(html,"lxml")
sys.setrecursionlimit(8000)
# Save the soup object to a file
with open("soup.pickle", "wb") as f:
pickle.dump(soup, f)
# Read the soup object from a file
with open("soup.pickle", "rb") as f:
soup_obj = pickle.load(f)
print(soup_obj.title)
# <title>python - How to save the BeautifulSoup object to a file and then read from it as BeautifulSoup? - Stack Overflow</title>.