How to scrape dynamic table data (world bank website)

Question

I was just trying to scrape the results framework information for several projects on the world bank's site. The library that I am using is scrapy but am open to even using selenium.

Link: (https://projects.worldbank.org/en/projects-operations/project-detail/P153012)

The problem that I am facing is:

The tables are dynamically generated and for some projects they would be completely missing or have lesser fields (this ensures I can't use scrapy as I don't know how to deal with javascript using scrapy)
With selenium the code I am using is as follows, but this only allows me to extract all the text and not individual cell items (can the same be done or am i just trying to do a fool's errand):

from selenium import webdriver

url = "https://projects.worldbank.org/en/projects-operations/project-detail/P153012"
driver = webdriver.Chrome(executable_path = "/Users/thenewcomputer/Downloads/chromedriver")
driver.get(url)
tables = driver.find_elements_by_class_name("ng-tns-c7-3")
for table in tables:
    title = table.find_elements_by_xpath('//*[@id="results"]/div/div/div[2]/div/div[1]/div/div/ul/li/table')
title
for x in title:
    print(x.text) #because i wanted to figure out if this was working correctly

Do let me know if there is an easier way of doing this and thanks in advance

undetected Selenium · Answer 1 · 2021-12-27T16:37:10.180

To print the texts from the tables, as an example from the table with heading Results Framework you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategy:

Code Block:

driver.get("https://projects.worldbank.org/en/projects-operations/project-detail/P153012")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[starts-with(., 'Results Framework')]//following::div[1]//ul//table"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output:

Increase of Municipality of Fortaleza own-source revenue capacity through planning and land-value capture instruments Value 0 - 17.75% increase in property tax revenues- 546.2% increase in PMF's revenues through Fortaleza online- 171.90% increase inSEUMA's revenues collected from use of urban instruments - 20% increase in property tax revenues- 100% increase in PMF's revenues through Fortaleza Online- 115% increase in SEUMA'srevenues collected from use of urban instruments
Date August 1, 2016 April 28, 2021 June 30, 2023

Comment

Hi thanks, this helped a lot for the other tables but for the results framework table (the last table in the page), I can't find the h3 tag. Also given that this is a print output is there a way to seperate each cell's output as one output or would that only be possible with regex — Anikan, Dec 27 '21 at 16:07
@Anikan Can you raise a new question for your new requirement please? — undetected Selenium, Dec 27 '21 at 16:08
Hi Debanjan but the original query was also about just the information in "the results framework information". But thanks for all your help — Anikan, Dec 27 '21 at 16:26
@Anikan Checkout the updated answer and let me know the status. — undetected Selenium, Dec 27 '21 at 16:37

How to scrape dynamic table data (world bank website)

1 Answers1