0

Hi am trying to scrape data from iframe tag, inside this tag have widget-loader.I tried to scrape rating and reviews by using scrapy and selenium but I can't able to scrape information.

HTML:

\<div class="tp-widget-summary__rating"\>\<span class="rating"\>2.3\</span\> / 5 

\<span class="separator"\>•\</span\> 

\<span class="tp-widget-summary__count"\>\<strong\>3\</strong\> reviews\</span\>\</div\> 

Python code:

self.driver.get(url) 

page_source = response.replace(body=self.driver.page_source) 

page_source.css(".tp-widget-summary__rating span::text").extract_first() 

I also tried with simple scrapy code and other ways like xpath.

Expected: {'rating':2.3,'reviews':3}

Developer
  • 1
  • 1

1 Answers1

0

As the element is within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the visibility_of_element_located().

  • You can use either of the following locator strategies:

    • Using CSS_SELECTOR:

      driver.get('https://shop.resmed.com.au/dreampad-pillow/')
      WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"div#trustpilotReviewsWidget iframe[title='Customer reviews powered by Trustpilot']")))
      print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.tp-widget-summary__information div.tp-widget-summary__rating > span.rating"))).text)
      print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.tp-widget-summary__information div.tp-widget-summary__rating > span.tp-widget-summary__count > strong"))).text)
      
    • Console output:

      2.3
      3
      
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait
     from selenium.webdriver.common.by import By
     from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352