0

I'm trying to scrape the audience score from rotten tomatoes. I was able to get reviews but not sure how use selenium to get the "audiencescore"

Source:

<score-board
audiencestate="upright"
audiencescore="96"
class="scoreboard"
rating="R"
skeleton="panel"
tomatometerstate="certified-fresh"
tomatometerscore="92"
data-qa="score-panel"
                >
<h1 slot="title" class="scoreboard__title" data-qa="score-panel-movie-title">Pulp Fiction</h1>
<p slot="info" class="scoreboard__info">1994, Crime/Drama, 2h 33m</p>
<a slot="critics-count" href="/m/pulp_fiction/reviews?intcmp=rt-scorecard_tomatometer-reviews" class="scoreboard__link scoreboard__link--tomatometer" data-qa="tomatometer-review-count">110 Reviews</a>
<a slot="audience-count" href="/m/pulp_fiction/reviews?type=user&amp;intcmp=rt-scorecard_audience-score-reviews" class="scoreboard__link scoreboard__link--audience" data-qa="audience-rating-count">250,000+ Ratings</a>
<div slot="sponsorship" id="tomatometer_sponsorship_ad"></div>
                </score-board>

Code:

from selenium import webdriver

driver = webdriver.Firefox()
url = 'https://www.rottentomatoes.com/m/pulp_fiction'
driver.get(url)

print(driver.find_element_by_css_selector('a[slot=audience-count]').text)
Lacer
  • 5,668
  • 10
  • 33
  • 45

3 Answers3

1

The attribute value of audiencescore which is not any text nodes value that's why we can't invoke .text method to grab that value. So you have to call get_attribute() after selecting the right locator. The following expression is working.

print(driver.find_element(By.CSS_SELECTOR,'#topSection score-board').get_attribute('audiencescore'))

#import

from selenium.webdriver.common.by import By
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
0

Try this:

1- Get element score-board

2- Get audiencescore attribute from element

audiencescore = driver.find_element_by_css_selector('score-board').get_attribute('audiencescore')
Wonka
  • 1,548
  • 1
  • 13
  • 20
0

You were close enough. To extract the value of the audiencescore attribute i.e. the text 96 ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "score-board.scoreboard"))).get_attribute("audiencescore"))
    
  • Using XPATH:

    driver.get("https://www.rottentomatoes.com/m/pulp_fiction")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//score-board[@class='scoreboard']"))).get_attribute("audiencescore"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

    96
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352