I'm practicing web scraping using Selenium and trying to scrape all the product links from Lululemon->Woman's main page. But I found that when I tried to use XPath to locate product URLs and then loop through the lists, the different part of each XPath for each product is in the middle, which suggests I cannot do as I expected.
For example, the Xpath of each product is :
/html/body/div[1]/div/main/div/section/div/div[3]/div[2]/div[2]/div/div[133]/div/div/div[2]/h3/a
/html/body/div[1]/div/main/div/section/div/div[3]/div[2]/div[2]/div/div[134]/div/div/div[2]/h3/a
/html/body/div[1]/div/main/div/section/div/div[3]/div[2]/div[2]/div/div[1]/div/div/div[2]/h3/a
See, the difference of each XPath lies in 133, 134, and 1, which represent the #id of products on this page
So how can I create a full list of information of all products (if XPath works) which allows me to loop through it to get every single product's list? Can anyone help me? I pasted my current code and attached the screenshot for reference. Thank you so much!
#this is how I got the web page
driver_path = 'D:/Python/Selenium/chromedriver'
url = "https://shop.lululemon.com/c/womens-leggings/_/N-8s6"
max_pass = 5
#get each product's url
option1 = webdriver.ChromeOptions()
option1.add_experimental_option('detach',True)
driver = webdriver.Chrome(chrome_options=option1,executable_path=driver_path)
driver.get(url)
sleep(2)
for i in range(max_pass):
sleep(3)
try:
driver.find_element_by_xpath('/html/body/div[1]/div/main/div/section/div/div[4]/div/button/span').click()
except:
pass
try:
driver.find_element_by_xpath('/html/body/div[1]/div/main/div/section/div/div[2]/div/button/span').click()
except:
pass
sleep(3)
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
#the next step should be to find the pattern of where each URL is located (this should be a list), then I need to loop through the list to get "href" for every single product
#By the way, I have also tried to use class name "link lll-font-weight-medium" to locate, but I don't know why python says "Message: chrome not reachable (Session info: chrome=95.0.4638.69)"
[p.get_attribute('href') for p in driver.find_elements_by_class_name('link lll-font-weight-medium')] #this doesn't work