0

I am trying to get the salary from this web_page but each time i got the same value "None"

however i tried to take different tags!

link_content = requests.get("https://wuzzuf.net/jobs/p/KxrcG1SmaBZB-Facility-Administrator-Majorel-Egypt-Alexandria-Egypt?o=1&l=sp&t=sj&a=search-v3")
soup = BeautifulSoup(link_content.text, 'html.parser')
salary = soup.find("span", {"class":"css-47jx3m"})
print(salary)

output:

None

1 Answers1

0

Page is being generated dynamically with Javascript, so Requests cannot see it as you see it. Try disabling Javascript in your browser and hard reload the page, and you will see a lot of information missing. However, data exists in page in a script tag. One way of getting that information is by slicing that script tag, to get to the information you need [EDITED to account for different encoded keys - now it should work for any job]:

import requests
from bs4 import BeautifulSoup as bs
import json
import pandas as pd


pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}

url = 'https://wuzzuf.net/jobs/p/KxrcG1SmaBZB-Facility-Administrator-Majorel-Egypt-Alexandria-Egypt?o=1&l=sp&t=sj&a=search-v3'

soup = bs(requests.get(url, headers=headers).text, 'html.parser')
salary = soup.select_one('script').text.split('Wuzzuf.initialStoreState = ')[1].split('Wuzzuf.serverRenderedURL = ')[0].rsplit(';', 1)[0]
data = json.loads(salary)['entities']['job']['collection']
enc_key = [x for x in data.keys()][0]
df = pd.json_normalize(data[enc_key]['attributes']['salary'])
print(df)

Result in terminal:

    min max currency    period  additionalDetails   isPaid
0   None    None    None    None    None    True
Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30