1

I use python with selenium try to scrape job position information from multiple pages but so far only first element information being printed out.

Another difficulty is that the website I am scraping is a job portal for UN, which means it also got information from other UN websites. If I want to get the orginal link, I have to click one position ,then click "allpy now", then redirect to the UN original website.

Thank you so much for reading my questions, I am a foregin policy backgroud person,just started my web scraping learning.

#import required packages
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys


#define the driver variable
driver = webdriver.Chrome()

#navigate to url given
driver.get("https://www.unjobnet.org/jobs?driver.get("https://www.unjobnet.org/jobs?orgtypes%5B0%5D=United+Nations+System&apptypes%5B0%5D=Internship&keywords=&orderby=closing")

#wait 5 seconds for elements to load
time.sleep(5)

#locate elements based on specified css path
divs = driver.find_elements(By.XPATH,'//*[@id="main"]/div[2]')

#get the text attribute of each element and print it
for div in divs:
    title = div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[1]').text
    area =  div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[2]').text
    place =  div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[4]').text
    deadline = div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[7]/span[2]').text

    print(title, area, place, deadline)

print result:

Communications - Intern UNDP - United Nations Development Programme New Delhi (India) Closing soon: 19 Jul 2023
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
HAIQI WAN
  • 11
  • 1
  • The xpath expression `//*[@id="main"]/div[2]` found only one matching element. In fact, I don't see how it could **ever** find more than one element. Why were you expecting more than one match? – John Gordon Jul 19 '23 at 17:14

2 Answers2

1

You can skip the whole selenium and load the data you want directly using their pagination api (the server sends to the page the data in Json form from external URL):

import requests
import pandas as pd
from bs4 import BeautifulSoup

api_url = "https://www.unjobnet.org/jobs"
params = {
    "orgtypes[0]": "United Nations System",
    "apptypes[0]": "Internship",
    "keywords": "",
    "orderby": "closing",
    "page": "1",
}

headers = {'X-Requested-With': 'XMLHttpRequest'}

all_dfs = []

for params['page'] in range(1, 3):  # <-- increase number of pages here
    data = requests.get(api_url, params=params, headers=headers).json()
    all_dfs.append(pd.DataFrame(data['jobs']))

df = pd.concat(all_dfs)
df['Deadline'] = df['Deadline'].apply(lambda x: BeautifulSoup(x, 'html.parser').text)
df.pop('LocationFlags')

print(df)

Prints:

JobID Title Department Grade Level CitiesCountries AppType Logo ShortName LongName Added Updated Deadline RecruitmentPlace DatePosted
60526362 Communications - Intern Intern Intern New Delhi (India) Internship 1617066589_f1b878cccc2893c87abf.png UNDP United Nations Development Programme 1 week ago 2023-07-19 14:06:04 Closing soon: 19 Jul 2023 2023-07-11 14:01:03
60586283 Internship Paid- Communications Support Intern Kingston (Jamaica) Internship 1617066589_f1b878cccc2893c87abf.png UNDP United Nations Development Programme 6 days ago 2023-07-19 14:01:03 Closing soon: 19 Jul 2023 2023-07-12 23:01:04
60080967 INTERN [Temporary] Department for General Assembly and Conference Management I-1 Intern New York (United States) Internship 2bfa093bb99a69d9585819674270ef76.svg UN DGACM Department for General Assembly and Conference Management 2 weeks ago 2023-07-19 14:04:08 Closing soon: 19 Jul 2023 2023-06-30 14:04:08
60568657 Internship Paid - Social Resilience & Inclusion Intern Kingston (Jamaica) Internship 1617066589_f1b878cccc2893c87abf.png UNDP United Nations Development Programme 1 week ago 2023-07-19 11:01:03 Closing soon: 19 Jul 2023 2023-07-12 14:01:04
60278375 INTERN - Data Analyst & Management [Temporary] United Nations Human Settlements Programme I-1 Intern Nairobi (Kenya) Internship 6bb959ad3e762845c59a3a8808e1871b.jpg UN-HABITAT United Nations Human Settlements Programme 2 weeks ago 2023-07-19 14:04:08 Closing soon: 19 Jul 2023 2023-07-05 14:04:07
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

To scrape the job position information from the webpage you need to induce WebDriverWait for visibility_of_all_elements_located() and using zip() method you can use the following Locator Strategies:

  • Code block:

    driver.get("https://www.unjobnet.org/jobs?driver.get(%22https://www.unjobnet.org/jobs?orgtypes%5B0%5D=United+Nations+System&apptypes%5B0%5D=Internship&keywords=&orderby=closing")
    titles = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[starts-with(@href, '/jobs/detail')]")))]
    areas = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@class='link-dark' and starts-with(@href, '/organizations')]")))]
    places = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[starts-with(@href, '/jobs?locations')]//parent::div[1]")))]
    deadlines = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[starts-with(@class, 'text-danger')]")))]
    for title,area,place,deadline in zip(titles,areas,places,deadlines):
      print(f"Title {title} at {area} in {place} closing on {deadline}")
    driver.quit()
    
  • Console output:

    Title Communications - Intern at UNDP - United Nations Development Programme in New Delhi (India) closing on Closing soon: 19 Jul 2023
    Title UN·E ASSISTANT·E FORMATION & COMMUNAUTÉ PROFESSIONNELLE (MEAL) EN STAGE at Action contre la Faim in France closing on Closing soon: 19 Jul 2023
    Title Internship Paid- Communications Support at UNDP - United Nations Development Programme in Kingston (Jamaica) closing on Closing soon: 19 Jul 2023
    Title INTERN [Temporary] at UN DGACM - Department for General Assembly and Conference Management in New York (United States) closing on Closing soon: 19 Jul 2023
    Title Internship Paid - Social Resilience & Inclusion at UNDP - United Nations Development Programme in Kingston (Jamaica) closing on Closing soon: 19 Jul 2023
    Title INTERN - Data Analyst & Management [Temporary] at UN-HABITAT - United Nations Human Settlements Programme in Nairobi (Kenya) closing on Closing soon: 19 Jul 2023
    Title INTERN - ENVIRONMENT AFFAIRS [Temporary] at UNEP - United Nations Environment Programme in Nairobi (Kenya) closing on Closing soon: 19 Jul 2023
    Title Internship Paid - Programme Support at UNDP - United Nations Development Programme in Kingston (Jamaica) closing on Closing soon: 19 Jul 2023
    Title Communications and Data Analysis Intern at UNDP - United Nations Development Programme in Minsk (Belarus) closing on Closing soon: 19 Jul 2023
    Title Health, Safety, Social and Environmental (HSSE) Management Intern at UNOPS - United Nations Office for Project Services in Copenhagen (Denmark) closing on Closing soon: 19 Jul 2023
    Title INTERN - PUBLIC INFORMATION [Temporary] at UN - United Nations Secretariat in Quito (Ecuador) closing on Closing soon: 19 Jul 2023
    Title RBA Front Office Intern (in-person) - 2 positions at UNDP - United Nations Development Programme in New York (United States) closing on Closing soon: 20 Jul 2023
    Title Internship – Aviation security policy support [Temporary] at ICAO - International Civil Aviation Organization in Montreal (Canada) closing on Closing soon: 20 Jul 2023
    Title INT 2023 22 - Intern - Software Development (Displacement Tracking Matrix (DTM)) - Geneva, Switzerland at IOM - International Organization for Migration in Geneva (Switzerland) closing on Closing soon: 20 Jul 2023
    Title Stagiaire_Protection de l'Enfant, Bujumbura-Burundi, 6 mois (exclusivement pour les Nationaux) at UNICEF - United Nations Children's Fund in Bujumbura (Burundi) closing on Closing soon: 20 Jul 2023
    Title Communications Intern at UNOPS - United Nations Office for Project Services in Tirana (Albania) closing on Closing soon: 20 Jul 2023
    Title INTERN - PROGRAMME MANAGEMENT [Temporary] at UN - United Nations Secretariat in New York (United States) closing on Closing soon: 20 Jul 2023
    Title INTERN - Communications [Temporary] at UNJSPF - United Nations Joint Staff Pension Fund in New York (United States) closing on Closing soon: 20 Jul 2023
    Title Consultants to support the technical assistance for the development of teaching manuals and internship guide for the vocational modules of the the specialized nursing qualification in neonatolog at WHO - World Health Organization in Maputo (Mozambique) closing on Closing soon: 20 Jul 2023
    Title INTERN - PUBLIC INFORMATION (Chinese News) [Temporary] at UN - United Nations Secretariat in New York (United States) closing on Closing soon: 20 Jul 2023
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352