3

I am trying to get the elements displayed as N06D-X N07X R01A-C01 S01G-X01 in the following image:HTML snapshot

Now, I got something like the WebDriver in this way:

who = driver.find_element_by_tag_name("span").find_elements_by_tag_name("p")

and get an output like this:

[<selenium.webdriver.remote.webelement.WebElement (session="1c044455cf883fdedf6845bcd456bfab", element="0.23338884730774767-2")>]

I am working on Mac Catalina and when I type: who.text it returns an empty list for some reason. I got quite similar troubles with Bs but I solved them with .string rather than .text. Here .string does not work on WebDriver elements.

The question is: how can I get the items N06D and so on with selenium?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Lusian
  • 629
  • 1
  • 5
  • 11

3 Answers3

2

Seems you were pretty close enough.

[<selenium.webdriver.remote.webelement.WebElement (session="1c044455cf883fdedf6845bcd456bfab", element="0.23338884730774767-2")>]

represents the element where as you were looking for the text within the element.

To extract the texts e.g. N06D-X, N07X, etc from all of the <p> tags using Selenium and you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.data-list__property#who-atc-codes span.data-list__property-value p")))])
    
  • Using XPATH and text attribute:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='data-list__property' and @id='who-atc-codes']//span[@class='data-list__property-value']//p")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Thank you all. @DebanjanB what does `EC.visibility_of_all_elements_located()` do? I mean how does it work? It seems a pretty useful tool actually. It seems that you ask to wait 20 seconds untile the condition in EC. is matched – Lusian Sep 26 '20 at 08:07
  • 1
    `visibility_of_all_elements_located()` is the [`expected_conditions`](https://stackoverflow.com/questions/59130200/selenium-wait-until-element-is-present-visible-and-interactable/59130336#59130336) used with [WebDriverWait](https://stackoverflow.com/questions/52603847/how-to-sleep-webdriver-in-python-for-milliseconds/52607451#52607451) which halts the execution of the [WebDriver](https://stackoverflow.com/questions/48079120/what-is-the-difference-between-chromedriver-and-webdriver-in-selenium/48080871#48080871) till the element is displayed & also has a height and width that is greater than 0. – undetected Selenium Sep 26 '20 at 14:18
1

you dont search in whole website but in previously found object

li_object = driver.find_elements_by_id('who-atc-codes')
lst = li_object.find_element_by_tag_name("span").find_elements_by_tag_name("p")

for p in lst:
    print(p.text)
    print(p.get_attribute('innerHTML'))

or you can try

span_object = li_object.find_element_by_tag_name("span")
print(span_object.get_attribute('innerHTML'))
woblob
  • 1,349
  • 9
  • 13
  • Ok. But now how can I get the .text and then all elements inside (i.e. N06D and so on) specifically? Maybe li_object[0].get_attribute("innerHTML")? But then how can I get N06D and so on? Maybe something like: for i in range(len(who)): print(who[i].get_attribute("innerHTML")) ? – Lusian Sep 25 '20 at 17:08
1

Use following css selector to get list of items and then iterate.

To get the text You can use either .text or .get_attribute("innterHTML") or .get_attribute("textContent")

Code:

items=driver.find_elements_by_css_selector("span.data-list__property-value>p")
for item in items:
    print(item.text)
    print(item.get_attribute("innterHTML"))
    print(item.get_attribute("textContent"))
    #To get only value from string use spilt and take the first element.
    print(item.text.strip().split(" ")[0])
    print(item.get_attribute("innterHTML").strip().split(" ")[0])
    print(item.get_attribute("textContent").strip().split(" ")[0])
KunduK
  • 32,888
  • 5
  • 17
  • 41