3

I am trying to scrape the reduced price of a product from a website.

The HTML looks like this when inspecting the website:

HTML from Inspector

My code looks like this:

browser = webdriver.Chrome(executable_path='/chromedriver.exe') 
browser.get('https://www.mydays.de/magicbox/kurzurlaub')
soup = BeautifulSoup(browser.page_source, 'html.parser')
Price = soup.find('div',{"class":"c-mbvoucher__pricebox"})

But my Result looks like this:

<div class="c-mbvoucher__pricebox">
<span class="c-mbvoucher__price">159 €</span>
<span class="c-mbvoucher__person">
            für 2 Personen        </span>
</div>

Why is some information missing in my result?

I also tried find_all, but the above is the only one matching.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352

1 Answers1

-1

To extract the text 159 € you can use either of the following Locator Strategies:

  • Using css_selector:

    print(driver.find_element_by_css_selector("div.c-mbvoucher__pricebox>span.c-mbvoucher__price"))
    
  • Using xpath:

    print(driver.find_element_by_xpath("//div[@class='c-mbvoucher__pricebox']/span[@class='c-mbvoucher__price']"))
    

Ideally, to locate the element you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.c-mbvoucher__pricebox>span.c-mbvoucher__price"))).get_attribute("innerHTML"))
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='c-mbvoucher__pricebox']/span[@class='c-mbvoucher__price']"))).text)
    
  • Console Output:

    159 €
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Update

If your usecase is to extract the text 119,25 € as the text is within a Text Node you can use either of the following solutions:

  • Using CSS_SELECTOR:

    print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.c-mbvoucher__pricebox")))).strip())
    
  • Using XPATH:

    print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='c-mbvoucher__pricebox']")))).strip())
    
  • Console Output:

    119,25 €
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Thank you very much for your reply! I may have formulated my question wrong, i`m trying to extract the offer-price (see screenshot - 119,25 € ). That price wont occur in my result - only the 159€ - which is the original price. – Lukas Zimmermann Nov 24 '20 at 13:41
  • 1
    @LukasZimmermann Checkout the answer update and let me know the status. – undetected Selenium Nov 24 '20 at 14:26