1

I have an HTML page that contains 40 of the following div

<div class='movie-featured'>
    <div class="item analytics">
        <div class="movie-details">
            <div class="movie-rating-wrapper">
                <span class="movie-rating-summary">
                    <span>some text</span>
                </span>
            </div>
        </div>
    </div>
</div>

and I'm trying to get the text from this span <span>some text</span> rom inside each one of the 40 divs via: find_element_by_css_selector('span.moview-rating-summary').find_element_by_tag_name('span').text

Output:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '6/10', '', '', '', '', '', '', '', '', '7.5/10', '', '', '', '', '']

As you can see, I only get text from few spans and not all of them.

I also tried: find_element_by_tag_name('span').get_attribute('textContent') and find_element_by_tag_name('span').get_attribute('innerHTML').

But still the same result

Any ideas how to fix that??

Code trials:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
delay = 10 
browser.get("www.example.com")


browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(2)
images =[]

myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'item-responsive')))


body = browser.find_element_by_class_name('movie-featured') # body of images container

imageItems = body.find_elements_by_css_selector('div.item.analytics')  #list of divs that hold movies images


for item in imageItems:
    
    rate = item.find_element_by_css_selector('span.moview-rating-summary').text

    images.append(rate)
    
print(images)
browser.close()

Thank you guys for all the help you gave. I fixed the problem by changing my code as follows:

body = browser.find_element_by_class_name('movie-featured')
rateDivs = body.find_elements_by_xpath('//div[@class="moview-rating-wrapper"]')
ratelist = []
for div in rateDivs:
    span = div.find_element_by_css_selector('span.moview-rating-summary')
    ratespan = span.find_element_by_tag_name('span')
    rate = ratespan.text
    if len(rate) > 0:
        ratelist.append(rate)
    else:
        continue
print(ratelist)

browser.close()

I really appreciate all the time you spent to help me.

Martin Wittick
  • 433
  • 1
  • 4
  • 14
  • Good first question! Keep posting! – shellter Sep 05 '20 at 15:04
  • Can you share the url you're using? - I think there's a more efficient way using find_elements_ to just get what you want without the loop but if like to test it before I post an answer – RichEdwards Sep 05 '20 at 17:18
  • @RichEdwards the url i'm using is private to my ISP in my country and It won't work outside the country – Martin Wittick Sep 05 '20 at 17:37
  • It feels like there's something in your identifiers that's not working as expected. Do you know how to use devtools to review all matches? – RichEdwards Sep 05 '20 at 19:03
  • @RichEdwards No, unfortunately I don't know how review them. – Martin Wittick Sep 06 '20 at 10:00
  • 1
    Open devtools (F12), go to the element tab, press crtl+f, and enter your xpath or css. Everything your identifier matches is what you'll return with find_elements - review those and ensure they all have text as you expect... The xpath by legend42 is pretty much what I would use so put that in and see if all the spans have text.... If you're certain your spans have text then consider if they all need to be scrolled into view to "exist". – RichEdwards Sep 06 '20 at 10:37

2 Answers2

1

To extract the texts e.g. some text, from allof the <span> using Selenium and you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.movie-rating-summary>span")))])
    
  • Using XPATH and text attribute:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='movie-rating-summary']/span")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Try this:

driver.find_element_by_xpath('//span[@class="movie-rating-summary"]/span[1]')
TheLegend42
  • 71
  • 1
  • 5