Selenium wait for javascript timingout

Question

What I want to do is to scrape the following site https://wiki.openstreetmap.org/wiki/Key:office and specifically the table containing all the tags so everything contained within:

<table class="wikitable taginfo-taglist">...<\table>

since everything within:

<div class="taglist" ...> ... <\div>

(the parent of the table) is generated by JavaScript I thought this code could work:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
    
options = Options()
options.add_argument("--headless")
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
driver = webdriver.Firefox(options=options, capabilities=caps, executable_path='../statics/geckodriver')
    
    
def get_tag_soup(url):
    driver.get(url)
    try:
        table = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME , "wikitable taginfo-taglist")))
        soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml') 
    except Exception as e:
        soup = e
    
    return soup 

get_tag_soup('https://wiki.openstreetmap.org/wiki/Key:office')

But when I run this code I just get an selenium.common.exceptions.TimeoutException('', None, None) more frustratingly some times if I WebDriverWait for the parent of "wikitable taginfo-taglist" with EC.presence_of_element_located((By.CLASS_NAME , "taglist")) it works.

if waiting for the parent works, why not do that, then something like table = the_parent.find_element_by_classname('wikitable taginfo-taglist') — Breaks Software, Feb 05 '21 at 10:47
waiting for the parent only works sometimes. Is there a way to wait for the whole site ? — Thagor, Feb 05 '21 at 10:48

undetected Selenium · Accepted Answer · 2021-02-05T10:57:49.073

To extract the table containing all the tags instead of presence_of_element_located() you have to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:

Using CSS_SELECTOR:

driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.wikitable.taginfo-taglist"))).text)

Using XPATH:

driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='wikitable taginfo-taglist']"))).text)

Console Output:

Key Value Element Description Map rendering Image Count
office accountant An office for an accountant.
6 895
1 967
14
office advertising_agency A service-based business dedicated to creating, planning, and handling advertising.
3 916
580
3
office architect An office for an architect or group of architects.
5 715
1 239
12
office association An office of a non-profit organisation, society, e.g. student, sport, consumer, automobile, bike association, etc.
13 054
3 286
50
office charity An office of a charitable organization
696
384
7
office company An office of a private company
129 801
36 951
608
office consulting An office for a consulting firm, providing expert professional advice to other companies or organisations.
1 341
162
4
office coworking An office where people can go to work (might require a fee); not limited to a single employer
1 297
320
7
office diplomatic
6 634
4 065
95
office educational_institution An office for an educational institution.
14 172
8 563
175
office employment_agency An office for an employment service.
7 300
1 771
43
office energy_supplier An office for a energy supplier.
2 237
1 112
19
office engineer An office for an engineer or group of engineers.
454
98
2
office estate_agent A place where you can rent or buy a house.
44 813
8 042
39
office financial An office of a company in the financial sector
4 891
1 588
24
office forestry A forestry office
523
741
9
office foundation An office of a foundation
1 757
542
10
office government An office of a (supra)national, regional or local government agency or department
98 289
70 569
2 300
office guide An office for tour guides, mountain guides, dive guides, etc.
587
168
1
office insurance An office at which you can take out insurance policies.
34 693
6 475
91
office it An office for an IT specialist.
9 486
2 039
51
office lawyer An office for a lawyer.
22 881
4 841
22
office logistics An office for a forwarder / hauler.
2 796
677
8
office moving_company An office which offers a relocation service.
605
252
4
office newspaper An office of a newspaper
3 511
1 450
27
office ngo An office for a non-profit, non-governmental organisation (NGO).
12 693
3 565
58
office notary An office for a notary public (common law)
3 860
548
9
office political_party An office of a political party
3 354
1 017
8
office property_management Office of a company, which manages a real estate property.
796
162
2
office quango An office of a quasi-autonomous non-governmental organisation.
366
233
4
office religion office of a community of faith
5 807
2 172
43
office research An office for research and development
3 667
4 545
348
office surveyor An office of a person doing surveys, this can be risk and damage evaluations of properties and equipment, opinion surveys or statistics.
451
109
1
office tax_advisor An office for a financial expert specially trained in tax law
5 053
823
4
office telecommunication An office for a telecommunication company
16 968
4 335
77
office visa An office of an organisation or business which offers visa assistance
95
1
0
office water_utility The office for a water utility company or water board.
743
908
20
office yes Generic tag for unspecified office type.
27 434
36 155
420

Note: Do ensure you have maximized the browser Viewport as follows:

options.add_argument("start-maximized")

Thx for the awnser but both xpath and css selector for me produce the same timeout error. maybe the issues is that the driver isn't rendering the javascript? — Thagor, Feb 05 '21 at 10:53
@Thagor Checkout the updated answer and let me know the status. — undetected Selenium, Feb 05 '21 at 11:00
sadly it does not solve the issue I tried waiting for 120 seconds which doesn't help either and it tried setting an window size which does nothing as well. — Thagor, Feb 05 '21 at 11:06
@Thagor Can you just copy and paste my code and retest please? — undetected Selenium, Feb 05 '21 at 11:18
I played a bit more with the code and found that when I do `time.sleep(10)` the table gets rendered but (By.CSS_SELECTOR, "wikitable taginfo-taglist") only works for `"taglist"` not for ` "table.wikitable.taginfo-taglist"` here the code times out — Thagor, Feb 05 '21 at 11:19
okay tried it @DebanjanB and the CSS_selector works! `"wikitable taginfo-taglist"` wasn't the right selector — Thagor, Feb 05 '21 at 11:20

Selenium wait for javascript timingout

1 Answers1