I am trying to scrape some data from Yahoo. I have written a script which works - some of the time. Sometimes when I run the script, I am able to download the complete page - other times, the page is only partially loaded - with the data portion missing.
What is even more perplexing, is that when I navigate to that page in my browser, the entire page is shown.
Here is the gist of my code:
import dryscrape
from bs4 import BeautifulSoup
url = 'http://finance.yahoo.com/quote/SPY/options?p=SPY&straddle=false'
sess = dryscrape.Session()
sess.set_header('user-agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0')
sess.set_attribute('auto_load_images', False)
sess.set_timeout(360)
sess.visit(url)
soup = BeautifulSoup(sess.body(), 'lxml')
# Related to memory leak issue in webkit
sess.reset()
# Barfs (sometimes!) at the line below
sel_list = soup.find('select', class_='Fz(s)')
if sel_list is None or len(sel_list) == 0:
print('element not found on page!')
I have attached images of the pages fetched below. Here is the web page, when viewed over the internet, via a web browser:
Now, here is the page I pulled down via a script similar to the one shown above - and it has no data!:
Can anyone work out why the element is sometimes missing when the data is fetched by my script? Equally (more?) importantly, how may I fix this?