Python web scraping with Selenium only extracts first element's data being printed out

Question

I use python with selenium try to scrape job position information from multiple pages but so far only first element information being printed out.

Another difficulty is that the website I am scraping is a job portal for UN, which means it also got information from other UN websites. If I want to get the orginal link, I have to click one position ,then click "allpy now", then redirect to the UN original website.

Thank you so much for reading my questions, I am a foregin policy backgroud person,just started my web scraping learning.

#import required packages
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys


#define the driver variable
driver = webdriver.Chrome()

#navigate to url given
driver.get("https://www.unjobnet.org/jobs?driver.get("https://www.unjobnet.org/jobs?orgtypes%5B0%5D=United+Nations+System&apptypes%5B0%5D=Internship&keywords=&orderby=closing")

#wait 5 seconds for elements to load
time.sleep(5)

#locate elements based on specified css path
divs = driver.find_elements(By.XPATH,'//*[@id="main"]/div[2]')

#get the text attribute of each element and print it
for div in divs:
    title = div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[1]').text
    area =  div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[2]').text
    place =  div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[4]').text
    deadline = div.find_element(By.XPATH,'//*[@id="main"]/div[2]/div/div[1]/div/div[2]/div[7]/span[2]').text

    print(title, area, place, deadline)

print result:

Communications - Intern UNDP - United Nations Development Programme New Delhi (India) Closing soon: 19 Jul 2023

The xpath expression `//*[@id="main"]/div[2]` found only one matching element. In fact, I don't see how it could **ever** find more than one element. Why were you expecting more than one match? — John Gordon, Jul 19 '23 at 17:14

score 1 · Answer 1 · answered Jul 19 '23 at 18:10

You can skip the whole selenium and load the data you want directly using their pagination api (the server sends to the page the data in Json form from external URL):

import requests
import pandas as pd
from bs4 import BeautifulSoup

api_url = "https://www.unjobnet.org/jobs"
params = {
    "orgtypes[0]": "United Nations System",
    "apptypes[0]": "Internship",
    "keywords": "",
    "orderby": "closing",
    "page": "1",
}

headers = {'X-Requested-With': 'XMLHttpRequest'}

all_dfs = []

for params['page'] in range(1, 3):  # <-- increase number of pages here
    data = requests.get(api_url, params=params, headers=headers).json()
    all_dfs.append(pd.DataFrame(data['jobs']))

df = pd.concat(all_dfs)
df['Deadline'] = df['Deadline'].apply(lambda x: BeautifulSoup(x, 'html.parser').text)
df.pop('LocationFlags')

print(df)

Prints:

JobID	Title	Department	Grade	Level	CitiesCountries	AppType	Logo	ShortName	LongName	Added	Updated	Deadline	DatePosted
60526362	Communications - Intern		Intern	Intern	New Delhi (India)	Internship	1617066589_f1b878cccc2893c87abf.png	UNDP	United Nations Development Programme	1 week ago	2023-07-19 14:06:04	Closing soon: 19 Jul 2023	2023-07-11 14:01:03
60586283	Internship Paid- Communications Support			Intern	Kingston (Jamaica)	Internship	1617066589_f1b878cccc2893c87abf.png	UNDP	United Nations Development Programme	6 days ago	2023-07-19 14:01:03	Closing soon: 19 Jul 2023	2023-07-12 23:01:04
60080967	INTERN [Temporary]	Department for General Assembly and Conference Management	I-1	Intern	New York (United States)	Internship	2bfa093bb99a69d9585819674270ef76.svg	UN DGACM	Department for General Assembly and Conference Management	2 weeks ago	2023-07-19 14:04:08	Closing soon: 19 Jul 2023	2023-06-30 14:04:08
60568657	Internship Paid - Social Resilience & Inclusion			Intern	Kingston (Jamaica)	Internship	1617066589_f1b878cccc2893c87abf.png	UNDP	United Nations Development Programme	1 week ago	2023-07-19 11:01:03	Closing soon: 19 Jul 2023	2023-07-12 14:01:04
60278375	INTERN - Data Analyst & Management [Temporary]	United Nations Human Settlements Programme	I-1	Intern	Nairobi (Kenya)	Internship	6bb959ad3e762845c59a3a8808e1871b.jpg	UN-HABITAT	United Nations Human Settlements Programme	2 weeks ago	2023-07-19 14:04:08	Closing soon: 19 Jul 2023	2023-07-05 14:04:07

score 0 · Answer 2 · answered Jul 19 '23 at 21:12

To scrape the job position information from the webpage you need to induce WebDriverWait for visibility_of_all_elements_located() and using zip() method you can use the following Locator Strategies:

Code block:

driver.get("https://www.unjobnet.org/jobs?driver.get(%22https://www.unjobnet.org/jobs?orgtypes%5B0%5D=United+Nations+System&apptypes%5B0%5D=Internship&keywords=&orderby=closing")
titles = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[starts-with(@href, '/jobs/detail')]")))]
areas = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@class='link-dark' and starts-with(@href, '/organizations')]")))]
places = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[starts-with(@href, '/jobs?locations')]//parent::div[1]")))]
deadlines = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[starts-with(@class, 'text-danger')]")))]
for title,area,place,deadline in zip(titles,areas,places,deadlines):
  print(f"Title {title} at {area} in {place} closing on {deadline}")
driver.quit()

Console output:

Title Communications - Intern at UNDP - United Nations Development Programme in New Delhi (India) closing on Closing soon: 19 Jul 2023
Title UN·E ASSISTANT·E FORMATION & COMMUNAUTÉ PROFESSIONNELLE (MEAL) EN STAGE at Action contre la Faim in France closing on Closing soon: 19 Jul 2023
Title Internship Paid- Communications Support at UNDP - United Nations Development Programme in Kingston (Jamaica) closing on Closing soon: 19 Jul 2023
Title INTERN [Temporary] at UN DGACM - Department for General Assembly and Conference Management in New York (United States) closing on Closing soon: 19 Jul 2023
Title Internship Paid - Social Resilience & Inclusion at UNDP - United Nations Development Programme in Kingston (Jamaica) closing on Closing soon: 19 Jul 2023
Title INTERN - Data Analyst & Management [Temporary] at UN-HABITAT - United Nations Human Settlements Programme in Nairobi (Kenya) closing on Closing soon: 19 Jul 2023
Title INTERN - ENVIRONMENT AFFAIRS [Temporary] at UNEP - United Nations Environment Programme in Nairobi (Kenya) closing on Closing soon: 19 Jul 2023
Title Internship Paid - Programme Support at UNDP - United Nations Development Programme in Kingston (Jamaica) closing on Closing soon: 19 Jul 2023
Title Communications and Data Analysis Intern at UNDP - United Nations Development Programme in Minsk (Belarus) closing on Closing soon: 19 Jul 2023
Title Health, Safety, Social and Environmental (HSSE) Management Intern at UNOPS - United Nations Office for Project Services in Copenhagen (Denmark) closing on Closing soon: 19 Jul 2023
Title INTERN - PUBLIC INFORMATION [Temporary] at UN - United Nations Secretariat in Quito (Ecuador) closing on Closing soon: 19 Jul 2023
Title RBA Front Office Intern (in-person) - 2 positions at UNDP - United Nations Development Programme in New York (United States) closing on Closing soon: 20 Jul 2023
Title Internship – Aviation security policy support [Temporary] at ICAO - International Civil Aviation Organization in Montreal (Canada) closing on Closing soon: 20 Jul 2023
Title INT 2023 22 - Intern - Software Development (Displacement Tracking Matrix (DTM)) - Geneva, Switzerland at IOM - International Organization for Migration in Geneva (Switzerland) closing on Closing soon: 20 Jul 2023
Title Stagiaire_Protection de l'Enfant, Bujumbura-Burundi, 6 mois (exclusivement pour les Nationaux) at UNICEF - United Nations Children's Fund in Bujumbura (Burundi) closing on Closing soon: 20 Jul 2023
Title Communications Intern at UNOPS - United Nations Office for Project Services in Tirana (Albania) closing on Closing soon: 20 Jul 2023
Title INTERN - PROGRAMME MANAGEMENT [Temporary] at UN - United Nations Secretariat in New York (United States) closing on Closing soon: 20 Jul 2023
Title INTERN - Communications [Temporary] at UNJSPF - United Nations Joint Staff Pension Fund in New York (United States) closing on Closing soon: 20 Jul 2023
Title Consultants to support the technical assistance for the development of teaching manuals and internship guide for the vocational modules of the the specialized nursing qualification in neonatolog at WHO - World Health Organization in Maputo (Mozambique) closing on Closing soon: 20 Jul 2023
Title INTERN - PUBLIC INFORMATION (Chinese News) [Temporary] at UN - United Nations Secretariat in New York (United States) closing on Closing soon: 20 Jul 2023

Python web scraping with Selenium only extracts first element's data being printed out

2 Answers2