1

i try to webscrape this part of a html:

<td class="zebraTable__td zebraTable__td--companyName"><a href="/unternehmen/8116602/schneider-electric-holding-germany-gmbh" data-gtm="companySearch__searchResult--76">
                        Schneider Electric Holding Germany GmbH
                    </a></td>

HTML Code

from this Site:

https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4

with this Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd
import time 

driver = webdriver.Chrome('/Users/rieder/Anaconda3/chromedriver_win32/chromedriver.exe')

driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=500&employeesTo=100000000&sortMethod=revenueDesc&p=1')

driver.find_element_by_id("cookiesNotificationConfirm").click();

company_name = driver.find_element_by_class_name('zebraTable__td zebraTable__td--companyName')

print(company_name)

I tried it for 4 hours and cant get it. I tried it with different methods like xpath, link text etc. but all i got is a empty company Name like this "[ ]".

Does someone know how selenium can find this exact piece of text "Liebherr-Hausgeräte Ochsenhausen GmbH"?

Thanks a lot.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Yankzz
  • 41
  • 5

2 Answers2

0

What you are looking for can be found in the source code of the page under

<div data-company-search><div data-var-name="companyResults" data and it is part of the page source. So you do not need selenium in order to get it. just read the page with requests and find the data using Beautiful Soup .

balderman
  • 22,927
  • 7
  • 34
  • 52
  • you are right! but i need this part of code for a code that generates a list of all the employees name. my fault, should have explained the whole thing, sorry – Yankzz Aug 31 '20 at 10:17
0

To print the text Schneider Electric Holding Germany GmbH you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#cookiesNotificationConfirm"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.zebraTable.zebraTable--companies tr:nth-child(2)>td.zebraTable__td.zebraTable__td--companyName>a"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    driver.get('https://de.statista.com/companydb/suche?idCountry=276&idBranch=0&revenueFrom=-1000000000000000000&revenueTo=1000000000000000000&employeesFrom=0&employeesTo=100000000&sortMethod=revenueDesc&p=4')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='cookiesNotificationConfirm']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='zebraTable zebraTable--companies']//following::tr[2]/td[@class='zebraTable__td zebraTable__td--companyName']/a"))).get_attribute("innerHTML"))
    
  • Console Output:

    Schneider Electric Holding Germany GmbH
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • This worked like a acharm, thanks a lot. I tried to implement this Code into my whole Code which tries to generate a list of all the Company Names for 500 or more employees, but it always takes just the first Name in the list if I repeat the command. I think its because .get_attribute() only gets one attribute and not all attributes foung in the xpath?! – Yankzz Aug 31 '20 at 10:14
  • @Yankzz This answer is specifically to extract the text **Schneider Electric Holding Germany GmbH**. For the list of all the Company Names we need to adjust the locators. can you raise a new question with your new requirement please? – undetected Selenium Aug 31 '20 at 10:18
  • Thanks, I opened a new Question: https://stackoverflow.com/questions/63669207/python-selenium-webscrape-table-with-text-in-html-using-webdriverwait – Yankzz Aug 31 '20 at 10:39