0

I'm trying to parse the following url, "https://projects.worldbank.org/en/projects-operations/document-detail/P179109?type=projects", using selenium in python, to do the following: 1- Go to URL 2- Wait for page to either Load table of documents Load element "No data available." such in this example "https://projects.worldbank.org/en/projects-operations/document-detail/P179227?type=projects"

3- Parse the content of table of documents

I tried the multiple codes, though this is the final:

from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

urls = ['https://projects.worldbank.org/en/projects-operations/document-detail/P179109?type=projects','https://projects.worldbank.org/en/projects-operations/document-detail/P179278?type=projects'

driver = webdriver.Chrome()

wait = WebDriverWait(driver, 10)
l1=[]
for url in urls:
    driver.get(url)
    wait.until(EC.visibility_of_any_elements_located((By.XPATH,"//tr[@_ngcontent-c2=""]/td[@_ngcontent-c2=""]"),(By.CSS_SELECTOR("div.procurement_notices[_ngcontent-c2=""]"))))
    # driver.find_element(By.TAG_NAME, 'tbody')
    l1.append(driver.find_element(By.XPATH, '//tr[@_ngcontent-c2=""]/td[@_ngcontent-c2=""]').text)

1 Answers1

0

This is a classic X-Y Problem. Here is a way to get that information you're after from the first page you mention, by scraping the API endpoint accessed by that page, in order to hydrate the document list:

import pandas as pd
import requests

api_url = 'https://search.worldbank.org/api/v2/wds?format=json&includepublicdocs=1&fl=docna,lang,docty,repnb,docdt,doc_authr,available_in&os=0&rows=20&os=0&proid=P179109&apilang=en&fct=countryname'
r = requests.get(api_url)
docs_list = []
for k, v in r.json()['documents'].items():
    docs_list.append(v)
df = pd.DataFrame(docs_list)
print(df)

Result in terminal:

    id  docna   docty   lang    entityids   repnb   docdt   display_title   pdfurl  listing_relative_url    url_friendly_title  new_url     guid    available_in    fullavailablein     url
0   33872193    {'0': {'docna': 'Concept Project Information D...   Project Information Document    English     {'entityid': '33872193'}    PIDC34319   2022-07-22T00:00:00Z    Concept Project Information Document\n ...  http://documents.worldbank.org/curated/en/0999...   /projects/documents/2022/07/33872193/concept-p...   http://documents.worldbank.org/curated/en/0999...   2022/07/33872193/P17910909e91120a0a1cc04f1ca93...   099915007222220480  English     []  http://documents.worldbank.org/curated/en/0999...
1   33871622    {'0': {'docna': 'Concept Environmental and Soc...   Environmental and Social Review Summary     English     {'entityid': '33871622'}    ESRSC02925  2022-07-21T00:00:00Z    Concept Environmental and Social\n ...  http://documents.worldbank.org/curated/en/0991...   /projects/documents/2022/07/33871622/concept-e...   http://documents.worldbank.org/curated/en/0991...   2022/07/33871622/P1791090dbf46c0c30993306f24d6...   099145007212228304  English     []  http://documents.worldbank.org/curated/en/0991...
2   33862268    {'0': {'docna': 'Minutes of a Virtual Meeting ...   Minutes     English     {'entityid': '090224b08a420d2a_2_0'}    173649  2022-05-24T00:00:00Z    Minutes of a Virtual Meeting of the\n ...   http://documents.worldbank.org/curated/en/5814...   /projects/documents/2022/05/33862268/minutes-v...   http://documents.worldbank.org/curated/en/5814...   2022/05/33862268/Minutes-of-a-Virtual-Meeting-...   581451657570685786  English     []  http://documents.worldbank.org/curated/en/5814...
3   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

In regards to your (likely nonrelevant for the issue at hand) question, here is the Selenium documentation for Waits.

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30