0

I trying to scrape a Dynamically loaded href attribute with Selenium and BeautifulSoup4.

When i view-source the website the href attribute is empty But When i click on inspect element the href attribute will have a link. Means that the href attribute is dynamically loaded. How can i extract that link?

I am Trying with Following Code

def Scrape_Udemy():
    driver.get('https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/')
    content = driver.page_source
    soup = BeautifulSoup(content, 'html.parser')
    course_link = soup.find_all('div',{'class':"rh_button_wrapper"})
    for i in course_link:
        link = i.find('a',href=True)
        if link is None:
           print('No Links Found')
        print(link['href'])

But when i run the function this is printing []. I am using Chrome Driver How can i solve this. I want to scrape FREE COUPON CODE link from Url https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/

V K
  • 5
  • 4

1 Answers1

0

Two things

  1. There's a box to allow that needs to be clicked before getting the page source
  2. Your link is a direct child of a span not a div

Code

import time
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome(executable_path=r'c:\users\aaron\chromedriver.exe')
driver.get('https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/')
time.sleep(5)
driver.find_element_by_xpath('//button[@class="align-right primary slidedown-button"]').click()
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
course_link = soup.find_all('span',{'class':"rh_button_wrapper"})
for i in course_link:
    link = i.find('a',href=True)
    if link is None:
        print('No Links Found')
    print(link['href'])

Output

https://couponscorpion.com/scripts/udemy/out.php?go=Q25aTzVXS1l0TXg1TExNZHE5a3pEUEM4SUxUZlBhWEhZWUwwd2FnS3RIVC96cE5lZEpKREdYcUFMSzZZaGlCM0V6RzF1eUE3aVJNaURZTFp5L0tKeVZ4dmRjOTcxN09WbVlKVXhOOGtIY2M9&s=e89c8d0358244e237e0e18df6b3fe872c1c1cd11&n=1298829005&a=0

Explanation

Always look at what happens when you do driver.get(), sometimes there's boxes that need clicked before you can get the page source. All browser activity has to be made.

Here's we're finding that element on that box to click using XPATH selectors.

//button[@class="align-right primary slidedown-button"]

This means

// - The entire DOM 
button - The HTML tag we want
[@class=""] - The HTML tag with class "" 

I usually put some time to wait before accessing elements, this page took a while to load and often you need to add in some waits before eyou can get the element or part of the page you want.

There are a couple of ways to do that, here is the quick and dirty method using the module time. There are specific ways to wait for elements to be appear using selenium. I actually had a go at these and wasn't able to get it to work.

Please see here in docs and here for the specific parts worth knowing about.

If you have a look at the HTML you will see that the link is behind a span element of class rh_button_wrapper not a div.

AaronS
  • 2,245
  • 2
  • 6
  • 16
  • Bro Showing Error : ```selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//button[@class="align-right primary slidedown-button"]"}``` Bro I want to Scrap Course Link from all of the Coupon scorpion posts is it work for all of the posts ?? – V K Aug 09 '20 at 07:38
  • It works on this particular site you specified in the question. I can't be certain of it being able to work for all posts. That wasn't made clear in your post that, that was your intention.I would use my advice I have given you. Look at driver.get(url) for the specific url you're getting. Do you get the notification box loading ? If not then that piece of code is not necessary. If so, is the button class attribute the same? If not then you need to change it. Time for you to do abit of the heavy lifting here, try this out and report back. Exceptions give you a lot of information. – AaronS Aug 09 '20 at 07:43
  • Bro, showing me error on the post that I said you first please clear this post issue then I will do the all the things on another posts... I am testing on Heroku Free Sever... With headless and no sandbox Chromedriver options.. I need the **Free Coupon Box** or **Limited Offer 86% Offer** Button Link available in the end of the post at right side... – V K Aug 09 '20 at 07:47
  • I have no idea what you are wanting here. I have tested this code exactly how it is in my SO answer on the url you posted. It works. If it doesn’t on your end. Please follow the guidance I have given in the comment above. You have a no element exception. Figure out what this could be by the comments advice above this comment. Selenium is brittle, view what is happening in the headless browser it could be as simple as your browser takes long to load the page. Alter that. You need to do some heavy lifting here I have given the code and a clear explanation to get the coupon-link for this page. – AaronS Aug 09 '20 at 08:14
  • I am just saying you that you are extracting wrong link.. please check my old comment and answer suggest edit if you want to help or I will try to do it myself.. thanks for your explanation :) – V K Aug 09 '20 at 08:19
  • There is no Free Coupon Box or Limited Offer 86% offer on the URL. https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/. There is a GET COUPON CODE button at the bottom right hand side of the page which is the url I have managed to get you in the code. Have you tried the code and tried the link it gives you ? If this is wrong, could be please show me either by pic or by telling me the url for the coupon code you want on the page you're scraping. – AaronS Aug 09 '20 at 08:39
  • I've just seen the edits you have suggest. You are suggesting another URL to the one in your original post. Could you please confirm that for the url https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/ it gets you the desired link first or was the proposed url in the edit the one you actually wanted? If not, I'm abit confused I do not see Free Coupon Box or Limited Offer 86% offer on the page https://couponscorpion.com/marketing/complete-guide-to-pinterest-pinterest-growth-2020/. – AaronS Aug 09 '20 at 08:45
  • I've now tried it on the bioinformatics links and the marketing link. They both grab the link for the free coupon, however every single time you load each URL, the link changes for the coupon slightly, which is why you may be saying it's extracting the wrong link. The link it grabs is stil linking to the correct udemy site. So please if I'm wrong let me know but I'm not sure what else I can do to help you. – AaronS Aug 09 '20 at 08:54
  • All the Images and Info -> https://telegra.ph/All-Images-08-09 only i am getting error your system is not – V K Aug 09 '20 at 09:05
  • So you need to post your code in stackoverflow. Don’t expect anyone to type out lines upon lines of code to help you. You also need to post the error you’re getting. I still want you to confirm if the code example I have given you gives you the correct link to the correct page for the first url you posted, the couponscorpion marketing page. Not any other pages just that page. – AaronS Aug 09 '20 at 09:11
  • Bro I gave the Code in https://telegra.ph/All-Images-08-09 and error and what i want all is inside the url. Writing code in stackoverflow the length of comment is less – V K Aug 09 '20 at 09:18
  • I didn’t see the error message on my phone so sorry about that. I am still not copying all your code from an image. I’m sorry it’s just not going to happen. I suggest take off the headless option in selenium to see what what the browser is actually doing. As I said it could be as simple as your browser is taking too long to load. I can’t help you with that. Make sure the allow box is being clicked before page loads. – AaronS Aug 09 '20 at 10:04