Web scraping data form ajax page

Question

I am attempting to scrape Job titles from here.

I am learning python scraping technique but I am stuck with the problem of scraping an Ajax page like this one. I am able to get the developer tool response data using below code for the first page. How to extract job titles from this data.

from bs4 import BeautifulSoup
import requests
import json

s = requests.Session()
headers={"User-Agent":"Mozilla/5.0"}
r=s.get('https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en',headers=headers)
html = r.text
soup = BeautifulSoup(html, 'lxml')
print(soup)

###how to extract job titles from soup###

Would really appreciate any help on this.

I am unfortunately currently limited to using only requests or another popular python library. Thanks in advance.

Does this answer your question? [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) — ggorlen, Jul 23 '21 at 17:47

score 1 · Answer 1 · answered Jul 20 '21 at 17:53

This site is dynamic (change data with javascript), so you have to use Selenium. You can run it in headless so it's like sending requests:

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--headless')

driver = webdriver.Chrome(executable_path=r'yourpath\chromedriver.exe', chrome_options=options)

driver.get('https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en')

html = (driver.page_source).encode('utf-8')
soup = BeautifulSoup(html, 'lxml')
print(soup)

score 0 · Answer 2 · answered Jul 20 '21 at 17:50

The data is within a <script> tag. You can use the re module to find the correct jobs titles.

import re
import requests

headers = {"User-Agent": "Mozilla/5.0"}

response = requests.get(
    "https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en"
)
job_titles = re.findall(r"Add this position to the job cart: (.*?)'", response.text)
print(len(job_titles))
print(job_titles)

Output:

25
['Technician, I %26 E (Coyanosa, TX)', 'Engineer, Senior Project', 'Engineer, Project', 'Mechanic, Truck (Monahans, TX)', 'Technician, Pipeline (Bryan/College Station)', 'Technician, Measurement (Farmington, NM)', 'Assistant, Field Administrative (Carlsbad, NM)', 'Technician, Pipeline (Greensburg, PA)', 'Human Resources Business Partner', 'Engineer, Senior Measurement', 'Accountant (Mont Belvieu)', 'Specialist, Senior Accounts Payable', 'Technician, Pipeline Trainee( Cape Girardeau)', 'Specialist, EAM Inventory', 'Welder - Class B', 'Specialist, Senior NGL Accounts Payable', 'Technician, Pipeline (Hobbs, NM)', 'Auditor, IT', 'Accountant, Intermediate', 'Accountant', 'Operator, Plant (Sonora, TX)', 'Technician, Pipeline (Carlsbad, NM)', 'Specialist, Maintenance (Lebanon, OH)', 'Technician, Pipeline Trainee ', 'Specialist, Senior Systems']

Thanks @MendelG, could you suggest how should I proceed further to extract titles from the remaining pages form [Ajax link](https://epco.taleo.net/careersection/alljobs/jobsearch.ajax) . — Dan, Jul 22 '21 at 13:50

score 0 · Answer 3 · answered Jul 20 '21 at 17:55

Try:

import re
import json
import requests

url = "https://epco.taleo.net/careersection/alljobs/jobsearch.ftl?lang=en"

data = re.search(r"listRequisition', (\[.*?\])\);", requests.get(url).text)
data = data.group(1).replace("'", '"')
data = json.loads(data)
for i in range(25):
    row = data[i * 40 : (i + 1) * 40]
    print(row[5])

Prints:

Technician, I %26 E (Coyanosa, TX)
Engineer, Senior Project
Engineer, Project
Mechanic, Truck (Monahans, TX)
Technician, Pipeline (Bryan/College Station)
Technician, Measurement (Farmington, NM)
Assistant, Field Administrative (Carlsbad, NM)
Technician, Pipeline (Greensburg, PA)
Human Resources Business Partner
Engineer, Senior Measurement
Accountant (Mont Belvieu)
Specialist, Senior Accounts Payable
Technician, Pipeline Trainee( Cape Girardeau)
Specialist, EAM Inventory
Welder - Class B
Specialist, Senior NGL Accounts Payable
Technician, Pipeline (Hobbs, NM)
Auditor, IT
Accountant, Intermediate
Accountant
Operator, Plant (Sonora, TX)
Technician, Pipeline (Carlsbad, NM)
Specialist, Maintenance (Lebanon, OH)
Technician, Pipeline Trainee 
Specialist, Senior Systems

Thanks @andrej, could you suggest how should I proceed further to extract titles from the remaining pages form [Ajaxlink](https://epco.taleo.net/careersection/alljobs/jobsearch.ajax) — Dan, Jul 22 '21 at 17:58

Web scraping data form ajax page

3 Answers3