My goal: On the AptDeco website (url in code below) there are links to 60 pieces of furniture. I want to scrape all 60 of those links. My solution is to: (i) create a selenium driver, (2) laod the AptDeco webpage on that driver, (3) pull the HTML code from the loaded webpage into beautiful soup, (4) extract all the HTML links from beautiful soup (see code below)
My issue: the HTML source code I am downloading to the variable named "html_page" only includes the first 6 pieces of furniture. I can re-create the issue manually. If I go to the url in my browser, right click and select "view page source" I see HTML source code that only includes links to the first 6 items. If I go to the url in my browser, right click and select "inspect", I see HTML source code that includes links to all 60 items. Is there a way to write a piece of code that pulls the HTML code as it appears in the "inspect" version rather than the "view page source" version? My hypothesis if that the website is dynamic, and there is a piece of JavaScript that has been executed in the "inspect" HTML version but not in the "view page source" version, but I'm unsure how to get the version I want.
Edit: It was pointed out that perhaps I needed to wait for Ajax content to load. I ran a couple of tests after I loaded the url to confirm this isn't the issue. First, I checked to see if there were jQuery's still active (raised an Exception, there was no jQuery). Second, I checked that the document.readyState is complete. After these two tests, I ran the "html_page = driver.page_source" line of code and found I was still getting the same issue.
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'https://www.aptdeco.com/catalog'
driver = webdriver.Chrome()
driver.get(url)
html_page = driver.page_source
soup = BeautifulSoup(html_page, "html.parser")
for link in soup.findAll('a', class_='Card__CardLink-rr6223-1 crcHwb'):
print(link.get('href'))