I'm doing an exercise in scraping data from a website. For example, ZocDoc. I'm trying to get a list of all insurance providers and their plans (You can access this information on their homepage in the insurance dropdown).
It appears that all data is loaded via a <scipt>
tag when the page loads. When looking in the network tab there doesn't appear to be any network calls that returns JSON including the plan names. I am able to get all the insurance plans using with the following (It's messy, but it works).
import requests
from bs4 import BeautifulSoup as bs
resp = requests.get('https://zocdoc.com')
long_str = str(soup.findAll('script')[17].string)
pop = data.split("Popular Insurances")[1]
json.loads(pop[pop.find("[["):pop.find("]]")+2])
In the HTML returned there are no insurance plans. I also don't see any requests in the network tab where the plans are sent back (there are a few backbone files). One url looks encoded but I'm not sure that that is it and I'm just overthinking this url.
I've also tried waiting for all the JS to load so the data is in the DOM using dryscrape but still no plans in the HTML.
Is there a way to gather this information without having a crawler click on every insurance provider to get their plans?