Getting text value of a HTML tag through Selenium Web Automation in Python?

Question

I am making a reddit bot that will look for certain attributes in comments, use selenium to visit the information website, and use driver.find_element_by... to get the value inside that tag, but it is not working.

When I use driver.find_element_by_class_name(), this is the data returned:

<selenium.webdriver.remote.webelement.WebElement (session="f454dcf92728b9db4de080a27a844bf7", element="514bd57d-99d7-4fce-a05d-3fa92f66ad49")>

when I use driver.find_elements_by_css_selector(".style-scope.ytd-video-renderer"), this is returned:

[
  <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2", element="6b4ee3e2-5e6b-48e2-8ec8-9083bf15baea")>, 
  <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2", ...
]

when I use driver.find_elements_by_css_selector(".style-scope.ytd-video-renderer").

Suppose that this is what I'm trying to locate (The above code returned the above Selenium data for this tag):

<yt-formatted-string class="style-scope ytd-video-renderer" aria-label="Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 by Melodic Star 2 months ago 4 minutes, 18 seconds 837,676 views">Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』</yt-formatted-string>

What I want

I want Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 returned.

What could I do?

score 4 · Answer 1 · edited Oct 26 '20 at 07:54

4

Use .text:

element = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string')
print(element.text)

edited Oct 26 '20 at 07:54

frianH

7,295
6
20
45

answered Sep 26 '20 at 11:49

Stroe Andrei

43
3

score 0 · Accepted Answer · answered Sep 26 '20 at 17:27

Seems you were pretty close enough. When you use driver.find_element_by_class_name() the first matching WebElement is returned. On printing the same, the output is:

<selenium.webdriver.remote.webelement.WebElement (session="f454dcf92728b9db4de080a27a844bf7", element="514bd57d-99d7-4fce-a05d-3fa92f66ad49")>

which represents the WebElement itself, which possibly contains the desired text.

On similar lines driver.find_elements_by_css_selector(".style-scope.ytd-video-renderer") returns a list of matching WebElements and on printing those, the output is:

[
  <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2", element="6b4ee3e2-5e6b-48e2-8ec8-9083bf15baea")>, 
  <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2",
  ...
]

Solution

To extract the text Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 from the following HTML:

<yt-formatted-string class="style-scope ytd-video-renderer" aria-label="Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 by Melodic Star 2 months ago 4 minutes, 18 seconds 837,676 views">Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』</yt-formatted-string>

You can use either of the following Locator Strategies:

Using css_selector and get_attribute():

print(driver.find_element_by_css_selector("yt-formatted-string.style-scope.ytd-video-renderer").get_attribute("innerHTML"))

Using xpath and text attribute:

print(driver.find_element_by_xpath("//yt-formatted-string[@class='style-scope ytd-video-renderer']").text)

Ideally, to print the text 3,862.76 you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and get_attribute():

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "yt-formatted-string.style-scope.ytd-video-renderer"))).get_attribute("innerHTML"))

Using XPATH and text attribute:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//yt-formatted-string[@class='style-scope ytd-video-renderer']"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Outro

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Thanks, just want you to dwell on that a bit more - suppose that I have multiple ` Lorem Ipsum >`, and I use the `driver.find_elements_by...` instead of the `driver.find_element_by...`, will it return the text as one long string or will it have a new line for each attribute (I'm storing this data and having it reply to a Reddit comment via PRAW, so if I use `comment.reply("tags: {}".format(tags))`, will it just put all the tags together in one string with no spaces or will it give a space between each tag? — KazutoKiritoKirigaya, Sep 26 '20 at 17:55
@KazutoKiritoKirigaya `driver.find_elements` will always return a list. You have to iterate the list to extract the text. — undetected Selenium, Sep 26 '20 at 17:57
That's the thing, it says that `driver.find_elements_by...` is not iterable. — KazutoKiritoKirigaya, Sep 26 '20 at 17:58
@KazutoKiritoKirigaya True, but we can offer you an optimal solution too :) Feel free to raise a new question as per your new requirement. StackOverflow contributers will be happy to help you out. — undetected Selenium, Sep 26 '20 at 18:00
Is there no obvious alternative solution to this problem so that I can extract the text from the `driver.find_elements_by...`? — KazutoKiritoKirigaya, Sep 26 '20 at 18:12
@KazutoKiritoKirigaya There are solutions for that as well. But as the context is different so I'm suggesting you to raise a new question so this question and the new one both are helpful for the future readers. — undetected Selenium, Sep 26 '20 at 18:15
Done, hope you answer there! Found your answer to this quite helpful. — KazutoKiritoKirigaya, Sep 26 '20 at 18:50

Getting text value of a HTML tag through Selenium Web Automation in Python?

2 Answers2

Solution

Outro