Python - Selenium - cant webscrape specific text content from html

Question

i try to webscrape this part of a html:

<td class="zebraTable__td zebraTable__td--companyName"><a href="/unternehmen/8116602/schneider-electric-holding-germany-gmbh" data-gtm="companySearch__searchResult--76">
                        Schneider Electric Holding Germany GmbH
                    </a></td>

HTML Code

from this Site:

https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4

with this Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import time 

driver = webdriver.Chrome('/Users/rieder/Anaconda3/chromedriver_win32/chromedriver.exe')

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=500&employeesTo=100000000&sortMethod=revenueDesc&p=1')

driver.find_element_by_id("cookiesNotificationConfirm").click();

company_name = driver.find_element_by_class_name('zebraTable__td zebraTable__td--companyName')

print(company_name)

I tried it for 4 hours and cant get it. I tried it with different methods like xpath, link text etc. but all i got is a empty company Name like this "[ ]".

Does someone know how selenium can find this exact piece of text "Liebherr-Hausgeräte Ochsenhausen GmbH"?

Thanks a lot.

score 0 · Answer 1 · answered Aug 27 '20 at 11:53

0

What you are looking for can be found in the source code of the page under

<div data-company-search><div data-var-name="companyResults" data and it is part of the page source. So you do not need selenium in order to get it. just read the page with requests and find the data using Beautiful Soup .

answered Aug 27 '20 at 11:53

balderman

22,927
7
34
52

you are right! but i need this part of code for a code that generates a list of all the employees name. my fault, should have explained the whole thing, sorry – Yankzz Aug 31 '20 at 10:17

undetected Selenium · Accepted Answer · 2020-08-27T12:12:57.630

To print the text Schneider Electric Holding Germany GmbH you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and text attribute:

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#cookiesNotificationConfirm"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.zebraTable.zebraTable--companies tr:nth-child(2)>td.zebraTable__td.zebraTable__td--companyName>a"))).text)

Using XPATH and get_attribute("innerHTML"):

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='cookiesNotificationConfirm']"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='zebraTable zebraTable--companies']//following::tr[2]/td[@class='zebraTable__td zebraTable__td--companyName']/a"))).get_attribute("innerHTML"))

Console Output:

Schneider Electric Holding Germany GmbH

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Outro

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

This worked like a acharm, thanks a lot. I tried to implement this Code into my whole Code which tries to generate a list of all the Company Names for 500 or more employees, but it always takes just the first Name in the list if I repeat the command. I think its because .get_attribute() only gets one attribute and not all attributes foung in the xpath?! — Yankzz, Aug 31 '20 at 10:14
@Yankzz This answer is specifically to extract the text **Schneider Electric Holding Germany GmbH**. For the list of all the Company Names we need to adjust the locators. can you raise a new question with your new requirement please? — undetected Selenium, Aug 31 '20 at 10:18
Thanks, I opened a new Question: https://stackoverflow.com/questions/63669207/python-selenium-webscrape-table-with-text-in-html-using-webdriverwait — Yankzz, Aug 31 '20 at 10:39

Python - Selenium - cant webscrape specific text content from html

2 Answers2

Outro