I was just trying to scrape the results framework information for several projects on the world bank's site. The library that I am using is scrapy but am open to even using selenium.
Link: (https://projects.worldbank.org/en/projects-operations/project-detail/P153012)
The problem that I am facing is:
The tables are dynamically generated and for some projects they would be completely missing or have lesser fields (this ensures I can't use scrapy as I don't know how to deal with javascript using scrapy)
With selenium the code I am using is as follows, but this only allows me to extract all the text and not individual cell items (can the same be done or am i just trying to do a fool's errand):
from selenium import webdriver
url = "https://projects.worldbank.org/en/projects-operations/project-detail/P153012"
driver = webdriver.Chrome(executable_path = "/Users/thenewcomputer/Downloads/chromedriver")
driver.get(url)
tables = driver.find_elements_by_class_name("ng-tns-c7-3")
for table in tables:
title = table.find_elements_by_xpath('//*[@id="results"]/div/div/div[2]/div/div[1]/div/div/ul/li/table')
title
for x in title:
print(x.text) #because i wanted to figure out if this was working correctly
Do let me know if there is an easier way of doing this and thanks in advance