I want to create a list of all of the diamonds' URLs in the table on Blue Nile, which should be ~142K entries. I noticed that I had to scroll to load more entries so the first solution I implemented for that is to scroll to the end of the page first before scraping. However, the max number of elements scraped would only be 1000. I learned that this is due to issues outlined in this question: Selenium find_elements_by_id() doesn't return all elements but the solutions aren't clear and straightforward for me.
I tried to scroll the page by a certain amount and scrape until the page has reached the end. However, I can only seem to get the initial 50 unique elements.
driver = webdriver.Chrome()
driver.get("https://www.bluenile.com/diamond-search?pt=setform")
source_site = 'www.bluenile.com'
SCROLL_PAUSE_TIME = 0.5
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)
new_height = 500
diamond_urls = []
soup = BeautifulSoup(driver.page_source, "html.parser")
count = 0
while new_height < last_height:
for url in soup.find_all('a', class_='grid-row row TL511DiaStrikePrice', href=True):
full_url = source_site + url['href'][1:]
diamond_urls.append(full_url)
count += 1
if count == 50:
driver.execute_script("window.scrollBy(0, 500);")
time.sleep(SCROLL_PAUSE_TIME)
new_height+=500
print(new_height)
count = 0
Please help me find the issue with my code above or suggest a better solution. Thanks!