Two things
- There's a box to allow that needs to be clicked before getting the page source
- Your link is a direct child of a
span
not a div
Code
import time
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome(executable_path=r'c:\users\aaron\chromedriver.exe')
driver.get('https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/')
time.sleep(5)
driver.find_element_by_xpath('//button[@class="align-right primary slidedown-button"]').click()
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
course_link = soup.find_all('span',{'class':"rh_button_wrapper"})
for i in course_link:
link = i.find('a',href=True)
if link is None:
print('No Links Found')
print(link['href'])
Output
https://couponscorpion.com/scripts/udemy/out.php?go=Q25aTzVXS1l0TXg1TExNZHE5a3pEUEM4SUxUZlBhWEhZWUwwd2FnS3RIVC96cE5lZEpKREdYcUFMSzZZaGlCM0V6RzF1eUE3aVJNaURZTFp5L0tKeVZ4dmRjOTcxN09WbVlKVXhOOGtIY2M9&s=e89c8d0358244e237e0e18df6b3fe872c1c1cd11&n=1298829005&a=0
Explanation
Always look at what happens when you do driver.get()
, sometimes there's boxes that need clicked before you can get the page source. All browser activity has to be made.
Here's we're finding that element on that box to click using XPATH selectors.
//button[@class="align-right primary slidedown-button"]
This means
// - The entire DOM
button - The HTML tag we want
[@class=""] - The HTML tag with class ""
I usually put some time to wait before accessing elements, this page took a while to load and often you need to add in some waits before eyou can get the element or part of the page you want.
There are a couple of ways to do that, here is the quick and dirty method using the module time. There are specific ways to wait for elements to be appear using selenium. I actually had a go at these and wasn't able to get it to work.
Please see here in docs and here for the specific parts worth knowing about.
If you have a look at the HTML you will see that the link is behind a span
element of class rh_button_wrapper
not a div.